[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095779#comment-16095779
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128683724
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -132,23 +135,57 @@ public static ParquetTableMetadata_v3 
getParquetTableMetadata(FileSystem fs,
   }
 
   /**
-   * Get the parquet metadata for a directory by reading the metadata file
+   * Get the parquet metadata for the table by reading the metadata file
*
-   * @param fs
+   * @param fs current file system
* @param path The path to the metadata file, located in the directory 
that contains the parquet files
-   * @return
-   * @throws IOException
+   * @param metaContext metadata context
+   * @param formatConfig parquet format plugin configs
+   * @return parquet table metadata
+   * @throws IOException if metadata file can't be read or updated
*/
-  public static ParquetTableMetadataBase readBlockMeta(FileSystem fs, Path 
path, MetadataContext metaContext, ParquetFormatConfig formatConfig) throws 
IOException {
-Metadata metadata = new Metadata(fs, formatConfig);
-metadata.readBlockMeta(path, false, metaContext);
-return metadata.parquetTableMetadata;
+  public static @Nullable ParquetTableMetadataBase 
readBlockMeta(FileSystem fs, Path path, MetadataContext metaContext,
+  ParquetFormatConfig formatConfig) {
+if (metaContext.isMetaCacheFileCorrect) {
+  Metadata metadata = new Metadata(fs, formatConfig);
+  try {
+metadata.readBlockMeta(path, false, metaContext);
+return metadata.parquetTableMetadata;
+  } catch (IOException e) {
+logger.error(e.toString());
+metaContext.isMetaCacheFileCorrect = false;
+  }
+}
+logger.warn("Ignoring unsupported or corrupted metadata file version. 
Query performance may be slow. Make sure " +
+"the cache file is up-to-date by running the REFRESH TABLE 
METADATA command");
+return null;
   }
 
-  public static ParquetTableMetadataDirs readMetadataDirs(FileSystem fs, 
Path path, MetadataContext metaContext, ParquetFormatConfig formatConfig) 
throws IOException {
-Metadata metadata = new Metadata(fs, formatConfig);
-metadata.readBlockMeta(path, true, metaContext);
-return metadata.parquetTableMetadataDirs;
+  /**
+   * Get the parquet metadata for all subdirectories by reading the 
metadata file
+   *
+   * @param fs current file system
+   * @param path The path to the metadata file, located in the directory 
that contains the parquet files
+   * @param metaContext metadata context
+   * @param formatConfig parquet format plugin configs
+   * @return parquet metadata for a directory
+   * @throws IOException if metadata file can't be read or updated
+   */
+  public static @Nullable ParquetTableMetadataDirs 
readMetadataDirs(FileSystem fs, Path path,
+  MetadataContext metaContext, ParquetFormatConfig formatConfig) {
+if (metaContext.isMetaDirsCacheFileCorrect) {
+  Metadata metadata = new Metadata(fs, formatConfig);
+  try {
+metadata.readBlockMeta(path, true, metaContext);
+return metadata.parquetTableMetadataDirs;
+  } catch (IOException e) {
+logger.error(e.toString());
+metaContext.isMetaDirsCacheFileCorrect = false;
+  }
+}
+logger.warn("Ignoring corrupted metadata file. Query performance may 
be slow. Make sure the cache file" +
--- End diff --

Maybe move the message to constant so it does not have to be maintained in 
two places? And, maybe add the file or directory that is corrupt?


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095778#comment-16095778
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128683594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/MetadataContext.java
 ---
@@ -41,6 +41,10 @@
 
   private PruneStatus pruneStatus = PruneStatus.NOT_STARTED;
 
+  // False values of these flags allow to avoid double reading of 
corrupted or unsupported metadata files
+  public boolean isMetaCacheFileCorrect = true;
+  public boolean isMetaDirsCacheFileCorrect = true;
--- End diff --

Nice!

How are these states cleared? Do we keep track of the file timestamp, and 
try again if the file is replaced/updated? Or, is there some kind of manual 
reset? Is the state only per query?


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4720) MINDIR() and IMINDIR() functions return no results with metadata cache

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095746#comment-16095746
 ] 

ASF GitHub Bot commented on DRILL-4720:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/864#discussion_r128631369
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ShowFileHandler.java
 ---
@@ -93,9 +94,9 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ValidationException, RelConv
 
 List rows = new ArrayList<>();
 
-for (FileStatus fileStatus : fs.list(false, new Path(defaultLocation, 
fromDir))) {
+for (FileStatus fileStatus : FileSystemUtil.listAll(fs, new 
Path(defaultLocation, fromDir), false)) {
   ShowFilesCommandResult result = new 
ShowFilesCommandResult(fileStatus.getPath().getName(), fileStatus.isDir(),
- 
!fileStatus.isDir(), fileStatus.getLen(),
+ 
!fileStatus.isDirectory(), fileStatus.getLen(),
--- End diff --

Is `fileStatus.isDirectory`  true for symlinks to directories? Is it 
clearer to use `isFile()` ?


> MINDIR() and IMINDIR() functions return no results with metadata cache
> --
>
> Key: DRILL-4720
> URL: https://issues.apache.org/jira/browse/DRILL-4720
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
> Fix For: 1.11.0
>
>
> Parquet directories with meta data cache return 0 rows for MINDIR and IMINDIR 
> functions.
> hadoop fs -ls /tmp/querylogs_4
> Found 6 items
> -rwxr-xr-x   3 mapr mapr  15406 2016-06-13 10:18 
> /tmp/querylogs_4/.drill.parquet_metadata
> drwxr-xr-x   - root root  4 2016-06-13 10:18 /tmp/querylogs_4/1985
> drwxr-xr-x   - root root  3 2016-06-13 10:18 /tmp/querylogs_4/1999
> drwxr-xr-x   - root root  3 2016-06-13 10:18 /tmp/querylogs_4/2005
> drwxr-xr-x   - root root  4 2016-06-13 10:18 /tmp/querylogs_4/2014
> drwxr-xr-x   - root root  6 2016-06-13 10:18 /tmp/querylogs_4/2016
> hadoop fs -ls /tmp/querylogs_4/1985
> Found 4 items
> -rwxr-xr-x   3 mapr mapr   3634 2016-06-13 10:18 
> /tmp/querylogs_4/1985/.drill.parquet_metadata
> drwxr-xr-x   - root root  2 2016-06-13 10:18 /tmp/querylogs_4/1985/Feb
> drwxr-xr-x   - root root  2 2016-06-13 10:18 /tmp/querylogs_4/1985/apr
> drwxr-xr-x   - root root  2 2016-06-13 10:18 
> /tmp/querylogs_4/1985/jan 
> SELECT * FROM `dfs.tmp`.`querylogs_4` WHERE dir0 = 
> MINDIR('dfs.tmp','querylogs_4');
> +---+---+--+---++++---+---+---+
> | voter_id  | name  | age  | registration  | contributions  | voterzone  | 
> date_time  | dir0  | dir1  | dir2  |
> +---+---+--+---++++---+---+---+
> +---+---+--+---++++---+---+---+
> No rows selected (0.803 seconds)
> If the meta cache is removed, expected data is returned.
> Here is the physical plan:
> {code}
> 00-00Screen : rowType = RecordType(ANY *): rowcount = 3.75, cumulative 
> cost = {54.125 rows, 169.125 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 664191
> 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 3.75, 
> cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 664190
> 00-02Project(T51¦¦*=[$0]) : rowType = RecordType(ANY T51¦¦*): 
> rowcount = 3.75, cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 664189
> 00-03  SelectionVectorRemover : rowType = RecordType(ANY T51¦¦*, ANY 
> dir0): rowcount = 3.75, cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 664188
> 00-04Filter(condition=[=($1, '.drill.parquet_metadata')]) : 
> rowType = RecordType(ANY T51¦¦*, ANY dir0): rowcount = 3.75, cumulative cost 
> = {50.0 rows, 165.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 664187
> 00-05  Project(T51¦¦*=[$0], dir0=[$1]) : rowType = RecordType(ANY 
> T51¦¦*, ANY dir0): rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 
> 0.0 io, 0.0 network, 0.0 memory}, id = 664186
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/tmp/querylogs_4/2005/May/voter25.parquet/0_0_0.parquet]], 
> selectionRoot=/tmp/querylogs_4, numFiles=1, usedMetadataFile=true, 
> columns=[`*`]]]) : rowType = (DrillRecordRow[*, dir0]): rowcount = 25.0, 
> 

[jira] [Assigned] (DRILL-5681) Incorrect query result when query uses star and correlated subquery

2017-07-20 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-5681:
-

Assignee: Jinfeng Ni

> Incorrect query result when query uses star and correlated subquery
> ---
>
> Key: DRILL-5681
> URL: https://issues.apache.org/jira/browse/DRILL-5681
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> The following repo was based on a testcase provided by Arjun 
> Rajan(ara...@mapr.com). 
> Drill returns incorrect query result, when the query has a correlated 
> subquery and querying against a view defined with select *, or querying a 
> subquery with select *.  
> Case 1: Querying view with select * + correlated subquery
> {code}
> create view dfs.tmp.region_view as select * from cp.`tpch/region.parquet`;
> {code}
> //Q1 :  return 25 rows. The correct answer is 0 row. 
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> | 0| ALGERIA |
> | 1| ARGENTINA   |
> | 2| BRAZIL  |
> ...
> | 24   | UNITED STATES   |
> +--+-+
> 25 rows selected (0.614 seconds)
> {code}
> // Q2:  return 0 row. The correct answer is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> +--+-+
> No rows selected (0.4 seconds)
> {code}
> Case 2: Querying a table expression with select *
> // Q3: return 25 rows. The correct result is 0 row
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> | 0| ALGERIA |
> | 1| ARGENTINA   |
> ...
> | 24   | UNITED STATES   |
> +--+-+
> 25 rows selected (0.451 seconds)
> {code}
> Q4: return 0 row. The correct result is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> +--+-+
> No rows selected (0.515 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5683) Incorrect query result when query uses NOT(IS NOT NULL) expression

2017-07-20 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-5683:
-

Assignee: Jinfeng Ni

> Incorrect query result when query uses NOT(IS NOT NULL) expression 
> ---
>
> Key: DRILL-5683
> URL: https://issues.apache.org/jira/browse/DRILL-5683
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> The following repo was modified from a testcase provided by Arjun 
> Rajan(ara...@mapr.com).
> 1. Prepare dataset with null.
> {code}
> create table dfs.tmp.t1 as 
>   select r_regionkey, r_name, case when mod(r_regionkey, 3) > 0 then 
> mod(r_regionkey, 3) else null end as flag 
>   from cp.`tpch/region.parquet`;
> select * from dfs.tmp.t1;
> +--+--+---+
> | r_regionkey  |r_name| flag  |
> +--+--+---+
> | 0| AFRICA   | null  |
> | 1| AMERICA  | 1 |
> | 2| ASIA | 2 |
> | 3| EUROPE   | null  |
> | 4| MIDDLE EAST  | 1 |
> +--+--+---+
> {code}
> 2. Query with NOT(IS NOT NULL) expression in the filter. 
> {code}
> select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);
> +--+-+---+
> | r_regionkey  | r_name  | flag  |
> +--+-+---+
> | 0| AFRICA  | null  |
> | 3| EUROPE  | null  |
> +--+-+---+
> {code}
> 3. Switch run-time code compiler from default to 'JDK', and get wrong result. 
> {code}
> alter system set `exec.java_compiler` = 'JDK';
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | exec.java_compiler updated.  |
> +---+--+
> select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);
> +--+--+---+
> | r_regionkey  |r_name| flag  |
> +--+--+---+
> | 0| AFRICA   | null  |
> | 1| AMERICA  | 1 |
> | 2| ASIA | 2 |
> | 3| EUROPE   | null  |
> | 4| MIDDLE EAST  | 1 |
> +--+--+---+
> {code}
> 4.  Wrong result could happen too, when NOT(IS NOT NULL) in Project operator.
> {code}
> select r_regionkey, r_name, NOT(flag IS NOT NULL) as exp1 from dfs.tmp.t1;
> +--+--+---+
> | r_regionkey  |r_name| exp1  |
> +--+--+---+
> | 0| AFRICA   | true  |
> | 1| AMERICA  | true  |
> | 2| ASIA | true  |
> | 3| EUROPE   | true  |
> | 4| MIDDLE EAST  | true  |
> +--+--+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5683) Incorrect query result when query uses NOT(IS NOT NULL) expression

2017-07-20 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-5683:
--
Description: 
The following repo was modified from a testcase provided by Arjun 
Rajan(ara...@mapr.com).

1. Prepare dataset with null.

{code}
create table dfs.tmp.t1 as 
  select r_regionkey, r_name, case when mod(r_regionkey, 3) > 0 then 
mod(r_regionkey, 3) else null end as flag 
  from cp.`tpch/region.parquet`;

select * from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

2. Query with NOT(IS NOT NULL) expression in the filter. 
{code}
select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);

+--+-+---+
| r_regionkey  | r_name  | flag  |
+--+-+---+
| 0| AFRICA  | null  |
| 3| EUROPE  | null  |
+--+-+---+
{code}

3. Switch run-time code compiler from default to 'JDK', and get wrong result. 
{code}
alter system set `exec.java_compiler` = 'JDK';

+---+--+
|  ok   |   summary|
+---+--+
| true  | exec.java_compiler updated.  |
+---+--+

select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

4.  Wrong result could happen too, when NOT(IS NOT NULL) in Project operator.
{code}
select r_regionkey, r_name, NOT(flag IS NOT NULL) as exp1 from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| exp1  |
+--+--+---+
| 0| AFRICA   | true  |
| 1| AMERICA  | true  |
| 2| ASIA | true  |
| 3| EUROPE   | true  |
| 4| MIDDLE EAST  | true  |
+--+--+---+
{code}


  was:
The following repo was modified from a testcase provided by Arjun 
Rajan(ara...@mapr.com).

1. Prepare dataset with null.

{code}
create table dfs.tmp.t1 as select r_regionkey, r_name, case when 
mod(r_regionkey, 3) > 0 then mod(r_regionkey, 3) else null end as flag from 
cp.`tpch/region.parquet`;

select * from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

2. Query with NOT(IS NOT NULL) expression in the filter. 
{code}
select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);

+--+-+---+
| r_regionkey  | r_name  | flag  |
+--+-+---+
| 0| AFRICA  | null  |
| 3| EUROPE  | null  |
+--+-+---+
{code}

3. Switch run-time code compiler from default to 'JDK', and get wrong result. 
{code}
alter system set `exec.java_compiler` = 'JDK';

+---+--+
|  ok   |   summary|
+---+--+
| true  | exec.java_compiler updated.  |
+---+--+

select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

4.  Wrong result could happen too, when NOT(IS NOT NULL) in Project operator.
{code}
select r_regionkey, r_name, NOT(flag IS NOT NULL) as exp1 from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| exp1  |
+--+--+---+
| 0| AFRICA   | true  |
| 1| AMERICA  | true  |
| 2| ASIA | true  |
| 3| EUROPE   | true  |
| 4| MIDDLE EAST  | true  |
+--+--+---+
{code}



> Incorrect query result when query uses NOT(IS NOT NULL) expression 
> 

[jira] [Created] (DRILL-5683) Incorrect query result when query uses NOT(IS NOT NULL) expression

2017-07-20 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5683:
-

 Summary: Incorrect query result when query uses NOT(IS NOT NULL) 
expression 
 Key: DRILL-5683
 URL: https://issues.apache.org/jira/browse/DRILL-5683
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni


The following repo was modified from a testcase provided by Arjun 
Rajan(ara...@mapr.com).

1. Prepare dataset with null.

{code}
create table dfs.tmp.t1 as select r_regionkey, r_name, case when 
mod(r_regionkey, 3) > 0 then mod(r_regionkey, 3) else null end as flag from 
cp.`tpch/region.parquet`;

select * from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

2. Query with NOT(IS NOT NULL) expression in the filter. 
{code}
select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);

+--+-+---+
| r_regionkey  | r_name  | flag  |
+--+-+---+
| 0| AFRICA  | null  |
| 3| EUROPE  | null  |
+--+-+---+
{code}

3. Switch run-time code compiler from default to 'JDK', and get wrong result. 
{code}
alter system set `exec.java_compiler` = 'JDK';

+---+--+
|  ok   |   summary|
+---+--+
| true  | exec.java_compiler updated.  |
+---+--+

select * from dfs.tmp.t1 where NOT (flag IS NOT NULL);
+--+--+---+
| r_regionkey  |r_name| flag  |
+--+--+---+
| 0| AFRICA   | null  |
| 1| AMERICA  | 1 |
| 2| ASIA | 2 |
| 3| EUROPE   | null  |
| 4| MIDDLE EAST  | 1 |
+--+--+---+
{code}

4.  Wrong result could happen too, when NOT(IS NOT NULL) in Project operator.
{code}
select r_regionkey, r_name, NOT(flag IS NOT NULL) as exp1 from dfs.tmp.t1;
+--+--+---+
| r_regionkey  |r_name| exp1  |
+--+--+---+
| 0| AFRICA   | true  |
| 1| AMERICA  | true  |
| 2| ASIA | true  |
| 3| EUROPE   | true  |
| 4| MIDDLE EAST  | true  |
+--+--+---+
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4335) Apache Drill should support network encryption - SASL encryption between Drill Client to Drillbit

2017-07-20 Thread Sorabh Hamirwasia (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095448#comment-16095448
 ] 

Sorabh Hamirwasia commented on DRILL-4335:
--

Repurposing this JIRA to reflect only SASL encryption support between Drill 
Client and Drillbit. Have created [DRILL-5682 | 
https://issues.apache.org/jira/browse/DRILL-5682] which will act as umbrella 
JIRA for other encryption support.

> Apache Drill should support network encryption - SASL encryption between 
> Drill Client to Drillbit
> -
>
> Key: DRILL-4335
> URL: https://issues.apache.org/jira/browse/DRILL-4335
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Keys Botzum
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit, security
> Fix For: 1.11.0
>
> Attachments: ApacheDrillEncryptionUsingSASLDesign.pdf
>
>
> This is clearly related to Drill-291 but wanted to make explicit that this 
> needs to include network level encryption and not just authentication. This 
> is particularly important for the client connection to Drill which will often 
> be sending passwords in the clear until there is encryption.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-4335) Apache Drill should support network encryption - SASL encryption between Drill Client to Drillbit

2017-07-20 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-4335:
-
Summary: Apache Drill should support network encryption - SASL encryption 
between Drill Client to Drillbit  (was: Apache Drill should support network 
encryption)

> Apache Drill should support network encryption - SASL encryption between 
> Drill Client to Drillbit
> -
>
> Key: DRILL-4335
> URL: https://issues.apache.org/jira/browse/DRILL-4335
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Keys Botzum
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit, security
> Fix For: 1.11.0
>
> Attachments: ApacheDrillEncryptionUsingSASLDesign.pdf
>
>
> This is clearly related to Drill-291 but wanted to make explicit that this 
> needs to include network level encryption and not just authentication. This 
> is particularly important for the client connection to Drill which will often 
> be sending passwords in the clear until there is encryption.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5681) Incorrect query result when query uses star and correlated subquery

2017-07-20 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095433#comment-16095433
 ] 

Jinfeng Ni commented on DRILL-5681:
---

The incorrect query result was caused by a wrong query plan, when Drill planner 
is doing the decorrelation logic. 

For Q1, here is the plan
{code}
00-00Screen
00-01  ProjectAllowDup(n_nationkey=[$0], n_name=[$1])
00-02Project(n_nationkey=[ITEM($0, 'n_nationkey')], n_name=[ITEM($0, 
'n_name')])
00-03  SelectionVectorRemover
00-04Filter(condition=[NOT(IS NOT NULL($2))])
00-05  HashJoin(condition=[=($0, $1)], joinType=[left])
00-07Project(T28¦¦*=[$0])
00-09  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]])
00-06HashAgg(group=[{0}], agg#0=[MIN($1)])
00-08  Project(T29¦¦*=[$1], $f0=[true])
00-10HashJoin(condition=[=($0, $2)], joinType=[inner])
00-12  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`r_regionkey`]]])
00-11  Project(T29¦¦*=[$0], $f1=[ITEM($0, 'n_regionkey')])
00-13HashAgg(group=[{0}])
00-14  Project(T29¦¦*=[$0])
00-15Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]])
{code}

Notice that operator 13 (HashAgg) is doing group by over * column, which is 
wrong. 


> Incorrect query result when query uses star and correlated subquery
> ---
>
> Key: DRILL-5681
> URL: https://issues.apache.org/jira/browse/DRILL-5681
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>
> The following repo was based on a testcase provided by Arjun 
> Rajan(ara...@mapr.com). 
> Drill returns incorrect query result, when the query has a correlated 
> subquery and querying against a view defined with select *, or querying a 
> subquery with select *.  
> Case 1: Querying view with select * + correlated subquery
> {code}
> create view dfs.tmp.region_view as select * from cp.`tpch/region.parquet`;
> {code}
> //Q1 :  return 25 rows. The correct answer is 0 row. 
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> | 0| ALGERIA |
> | 1| ARGENTINA   |
> | 2| BRAZIL  |
> ...
> | 24   | UNITED STATES   |
> +--+-+
> 25 rows selected (0.614 seconds)
> {code}
> // Q2:  return 0 row. The correct answer is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> +--+-+
> No rows selected (0.4 seconds)
> {code}
> Case 2: Querying a table expression with select *
> // Q3: return 25 rows. The correct result is 0 row
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> | 0| ALGERIA |
> | 1| ARGENTINA   |
> ...
> | 24   | UNITED STATES   |
> +--+-+
> 25 rows selected (0.451 seconds)
> {code}
> Q4: return 0 row. The correct result is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--+-+
> | n_nationkey  | n_name  |
> +--+-+
> +--+-+
> No rows selected (0.515 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5682) Apache Drill should support network encryption

2017-07-20 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5682:


 Summary: Apache Drill should support network encryption
 Key: DRILL-5682
 URL: https://issues.apache.org/jira/browse/DRILL-5682
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


Creating this one to repurpose DRILL-4335 for SASL encryption between Drill 
Client to Drillbit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5681) Incorrect query result when query uses star and correlated subquery

2017-07-20 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5681:
-

 Summary: Incorrect query result when query uses star and 
correlated subquery
 Key: DRILL-5681
 URL: https://issues.apache.org/jira/browse/DRILL-5681
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni


The following repo was based on a testcase provided by Arjun 
Rajan(ara...@mapr.com). 

Drill returns incorrect query result, when the query has a correlated subquery 
and querying against a view defined with select *, or querying a subquery with 
select *.  

Case 1: Querying view with select * + correlated subquery
{code}
create view dfs.tmp.region_view as select * from cp.`tpch/region.parquet`;
{code}

//Q1 :  return 25 rows. The correct answer is 0 row. 
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)

+--+-+
| n_nationkey  | n_name  |
+--+-+
| 0| ALGERIA |
| 1| ARGENTINA   |
| 2| BRAZIL  |
...
| 24   | UNITED STATES   |
+--+-+
25 rows selected (0.614 seconds)
{code}

// Q2:  return 0 row. The correct answer is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--+-+
| n_nationkey  | n_name  |
+--+-+
+--+-+
No rows selected (0.4 seconds)
{code}

Case 2: Querying a table expression with select *
// Q3: return 25 rows. The correct result is 0 row
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--+-+
| n_nationkey  | n_name  |
+--+-+
| 0| ALGERIA |
| 1| ARGENTINA   |
...
| 24   | UNITED STATES   |
+--+-+
25 rows selected (0.451 seconds)
{code}

Q4: return 0 row. The correct result is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--+-+
| n_nationkey  | n_name  |
+--+-+
+--+-+
No rows selected (0.515 seconds)
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5680) BasicPhysicalOpUnitTest can't run in Eclipse with Java 8

2017-07-20 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5680:
--

 Summary: BasicPhysicalOpUnitTest can't run in Eclipse with Java 8
 Key: DRILL-5680
 URL: https://issues.apache.org/jira/browse/DRILL-5680
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Paul Rogers
Priority: Minor


A unit test failure was detected in the test {{BasicPhysicalOpUnitTest}}. 
Wanted to run this test in Eclipse to track down the error. But, this test uses 
Mockito which cannot run in Java 8 under Eclipse:

{code}
java.lang.UnsupportedClassVersionError: org/apache/drill/test/DrillTest : 
Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095182#comment-16095182
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/877
  
@paul-rogers could you please review additional commit. The main changes:

1. All IOExceptions are ignored while deserializing metadata cache files 
(JsonMappingException, JsonParseException...) with appropriate logging. To 
avoid of reading such corrupted or unsupported file again that status is stored 
in metadata context.

2. Creating the metadata context while expanding Selection from Metadata 
Cache (if it is `null`) allows to always detect the status of partition punning.

3. Two new test cases are added: `testCorruptedMetadataFile()` and 
`testFutureUnsupportedMetadataVersion()`.

4. Test data created in `@BeforeClass` in `TestParquetMetadataCache` will 
be removed after executing all tests. 

5. For  `testMetadataCacheAbsolutePaths()` and 
`testSpacesInMetadataCachePath()` tests  `testPlanMatchingPatterns()` is added.



> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4364) Image Metadata Format Plugin

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095101#comment-16095101
 ] 

ASF GitHub Bot commented on DRILL-4364:
---

Github user nagix commented on the issue:

https://github.com/apache/drill/pull/367
  
Congratulations, @cgivre! I will check the latest version of the dependent 
library, and rebase this branch.


> Image Metadata Format Plugin
> 
>
> Key: DRILL-4364
> URL: https://issues.apache.org/jira/browse/DRILL-4364
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Akihiko Kusanagi
>
> Support querying of metadata in various image formats. This plugin leverages 
> [metadata-extractor|https://github.com/drewnoakes/metadata-extractor]. This 
> plugin is especially useful when querying on a large number of image files 
> stored in a distributed file system without building metadata repository in 
> advance.
> This plugin supports the following file formats.
> * JPEG, TIFF, WebP, PSD, PNG, BMP, GIF, ICO, PCX
> * Camera Raw: NEF (Nikon), CR2 (Canon), ORF (Olympus), ARW (Sony), RW2 
> (Panasonic), RWL (Leica), SRW (Samsung)
> This plugin enables to read the following metadata.
> * Exif, IPTC, XMP, JFIF / JFXX, ICC Profiles, Photoshop fields, WebP 
> properties, PNG properties, BMP properties, GIF properties, ICO properties, 
> PCX properties
> Since each type of metadata has a different set of fields, the plugin returns 
> a set of commonly-used fields such as the image width, height and bits per 
> pixels for ease of use.
> *Examples:*
> Querying on a JPEG file with the property descriptive: true
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`4349313028_f69ffa0257_o.jpg`;
> +--+--+--++--+---++-+--++---+--+-+---+--+--+--++--+--+
> | FileName | FileSize | FileDateTime | Format | DPIWidth | DPIHeight | 
> PixelWidth | PixelHeight | BitsPerPixel | Orientaion | ColorMode | HasAlpha | 
> GPS | ExifThumbnail | JFIF | IPTC | JPEG | ExifSubIFD | ExifIFD0 | 
> Interoperability |
> +--+--+--++--+---++-+--++---+--+-+---+--+--+--++--+--+
> | 4349313028_f69ffa0257_o.jpg | 257213 bytes | Mon Feb 01 18:00:56 JST 2016 | 
> JPEG | 96.0 | 96.0 | 1199 | 800 | 24 | Unknown (0) | RGB | false | 
> {"GPSVersionID":".022","GPSLatitudeRef":"N","GPSLatitude":"47° 32' 
> 15.98\"","GPSLongitudeRef":"W","GPSLongitude":"-122° 2' 
> 6.37\"","GPSAltitudeRef":"Sea level","GPSAltitude":"0 metres"} | 
> {"ThumbnailCompression":"JPEG (old-style)","XResolution":"72 dots per 
> inch","YResolution":"72 dots per 
> inch","ResolutionUnit":"Inch","ThumbnailOffset":"414 
> bytes","ThumbnailLength":"7213 bytes"} | 
> {"Version":"1.1","ResolutionUnits":"inch","XResolution":"96 
> dots","YResolution":"96 dots"} | {"Keywords":"135;2002;issaquah;police 
> car;wa;washington"} | {"CompressionType":"Baseline","DataPrecision":"8 
> bits","ImageHeight":"800 pixels","ImageWidth":"1199 
> pixels","NumberOfComponents":"3","Component1":"Y component: Quantization 
> table 0, Sampling factors 2 horiz/2 vert","Component2":"Cb component: 
> Quantization table 1, Sampling factors 1 horiz/1 vert","Component3":"Cr 
> component: Quantization table 1, Sampling factors 1 horiz/1 vert"} | 
> {"ExifVersion":"2.10","UniqueImageID":"d65e93b836d15a0c5e041e6b7258c76e"} | 
> {"Software":"Picasa 3.0"} | {"InteroperabilityIndex":"Unknown (
> )","InteroperabilityVersion":"1.00"} |
> +--+--+--++--+---++-+--++---+--+-+---+--+--+--++--+--+
> 1 row selected (1.712 seconds)
> {noformat}
> Querying on a JPEG file with the property descriptive: false
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`4349313028_f69ffa0257_o.jpg`;
> +--+--+--++--+---++-+--++---+--+-+---+--+--+--++--+--+
> | FileName | FileSize | FileDateTime | Format | DPIWidth | DPIHeight | 
> PixelWidth | PixelHeight | BitsPerPixel | Orientaion | ColorMode | HasAlpha | 
> GPS | ExifThumbnail | JFIF | IPTC | JPEG | ExifSubIFD | ExifIFD0 | 
> Interoperability |
> 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095035#comment-16095035
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128579751
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -557,7 +556,7 @@ private void readBlockMeta(String path,
 mapper.registerModule(serialModule);
 mapper.registerModule(module);
 mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, 
false);
-FSDataInputStream is = fs.open(p);
+FSDataInputStream is = fs.open(path);
--- End diff --

Answered under Paul's comment.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-07-20 Thread Chun Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang reassigned DRILL-5316:
-

Assignee: Robert Hou  (was: Chun Chang)

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Robert Hou
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095004#comment-16095004
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128502337
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
+ * @param fs current file system
+ * @param path of metadata cache file
+ * @return true if metadata version is supported, false otherwise
+ * @throws IOException if parquet metadata can't be deserialized from 
the json file
+ */
+public static boolean isVersionSupported(FileSystem fs, Path path) 
throws IOException {
--- End diff --

That new deserialization persistence class for reading `metadata` version  
is removed. 
For now we try to deserialize the `metadata` file and in case of getting 
any inheritor of `JsonProcessingException` ( for example `JsonMappingException` 
or `JsonParseException`) the `metadata` will be null and will be ignored (with 
appropriate logging). To avoid of reading such corrupted or unsupported file 
again that status is stored in `metadata context`. 


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095005#comment-16095005
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128502875
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
+ * @param fs current file system
+ * @param path of metadata cache file
+ * @return true if metadata version is supported, false otherwise
+ * @throws IOException if parquet metadata can't be deserialized from 
the json file
+ */
+public static boolean isVersionSupported(FileSystem fs, Path path) 
throws IOException {
+  ObjectMapper mapper = new ObjectMapper();
+  mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, 
false);
+  FSDataInputStream is = fs.open(path);
+
+  MetadataVersion metadataVersion = mapper.readValue(is, 
MetadataVersion.class);
+  Versions version = Versions.fromString(metadataVersion.textVersion);
+  if (!(version == null)) {
--- End diff --

`Enum` and this part of code is removed from this PR. Thanks.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094996#comment-16094996
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128503448
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -723,19 +723,20 @@ private void init(MetadataContext metaContext) throws 
IOException {
 // if querying a single file we can look up the metadata directly 
from the file
 metaPath = new Path(p, Metadata.METADATA_FILENAME);
   }
-  if (metaPath != null && fs.exists(metaPath)) {
+  if (metaPath != null && fs.exists(metaPath) && 
Metadata.MetadataVersion.isVersionSupported(fs, metaPath)) {
--- End diff --

Agree. Thanks. 
Tthis code is deleted in context of other comment. 


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095006#comment-16095006
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128564452
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -446,10 +447,25 @@ public void testMetadataCacheAbsolutePaths() throws 
Exception {
 }
   }
 
+  @Test
--- End diff --

Two test cases are added: `testFutureUnsupportedMetadataVersion()` and 
`testCorruptedMetadataFile()`. 

Test first one is the test case for the future unsupported versions that is 
not in the MetadataVersion.SUPPORTED_VERSIONS list. In test case `v4` is used 
for now (since the last version is `v3_1`). There was `JsonMappingException` 
earlier. 

Other test case uses metadata file with corrupted json part. There was 
`JsonParseException` earlier. 

Note: To avoid any `json` deserializing exception we catch 
`JsonProcessingException` (parent class of the above ones).


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095011#comment-16095011
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128498692
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095008#comment-16095008
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128502487
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
+ * @param fs current file system
+ * @param path of metadata cache file
+ * @return true if metadata version is supported, false otherwise
+ * @throws IOException if parquet metadata can't be deserialized from 
the json file
+ */
+public static boolean isVersionSupported(FileSystem fs, Path path) 
throws IOException {
+  ObjectMapper mapper = new ObjectMapper();
+  mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, 
false);
+  FSDataInputStream is = fs.open(path);
--- End diff --

Answered above.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094999#comment-16094999
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128498212
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
--- End diff --

Answered above. Thanks.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095007#comment-16095007
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128497912
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1377,7 +1386,7 @@ public void serialize(ColumnMetadata_v2 value, 
JsonGenerator jgen, SerializerPro
*
* Difference between v3 and v2 : min/max, type_length, precision, 
scale, repetitionLevel, definitionLevel
*/
-  @JsonTypeName("v3") public static class ParquetTableMetadata_v3 extends 
ParquetTableMetadataBase {
+  @JsonTypeName("v3_1") public static class ParquetTableMetadata_v3 
extends ParquetTableMetadataBase {
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095001#comment-16095001
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128499798
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095000#comment-16095000
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128496657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -557,7 +556,7 @@ private void readBlockMeta(String path,
 mapper.registerModule(serialModule);
 mapper.registerModule(module);
 mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, 
false);
-FSDataInputStream is = fs.open(p);
+FSDataInputStream is = fs.open(path);
--- End diff --

The stream is closed after performing deserializing 
ObjectMapper.readValue() 
[link](https://github.com/apache/drill/blob/9cf6faa7aa834c7ba654ce956c8b523ff3464658/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L580).
 But try-with-resources is a good way to be sure the stream is closed in case 
of getting exception before deserializing.
Thanks. Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095010#comment-16095010
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128504130
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -723,19 +723,20 @@ private void init(MetadataContext metaContext) throws 
IOException {
 // if querying a single file we can look up the metadata directly 
from the file
 metaPath = new Path(p, Metadata.METADATA_FILENAME);
   }
-  if (metaPath != null && fs.exists(metaPath)) {
+  if (metaPath != null && fs.exists(metaPath) && 
Metadata.MetadataVersion.isVersionSupported(fs, metaPath)) {
 usedMetadataCache = true;
-parquetTableMetadata = Metadata.readBlockMeta(fs, 
metaPath.toString(), metaContext, formatConfig);
+parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath, 
metaContext, formatConfig);
   } else {
 parquetTableMetadata = Metadata.getParquetTableMetadata(fs, 
p.toString(), formatConfig);
   }
 } else {
   Path p = Path.getPathWithoutSchemeAndAuthority(new 
Path(selectionRoot));
   metaPath = new Path(p, Metadata.METADATA_FILENAME);
-  if (fs.isDirectory(new Path(selectionRoot)) && fs.exists(metaPath)) {
+  if (fs.isDirectory(new Path(selectionRoot)) && fs.exists(metaPath)
--- End diff --

Another approach to detect the unsupported version is used. So it's no 
longer an issue.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094998#comment-16094998
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128496785
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -658,11 +657,21 @@ private boolean tableModified(List 
directories, Path metaFilePath,
 return false;
   }
 
+  /**
+   * Basic class for parquet metadata. Inheritors of this class are json 
serializable structures of
+   * different versions metadata cache files.
+   *
--- End diff --

Thanks. Done. 


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095009#comment-16095009
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128495285
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -537,13 +537,12 @@ private void writeFile(ParquetTableMetadataDirs 
parquetTableMetadataDirs, Path p
* @return
* @throws IOException
*/
-  private void readBlockMeta(String path,
+  private void readBlockMeta(Path path,
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094995#comment-16094995
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128498272
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095002#comment-16095002
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128500608
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
   .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
   if (relativeFilePath.isAbsolute()) {
 throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
-basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+  }
+  return relativeFilePath.toUri().getPath();
+}
+  }
+
+  /**
+   * Used to identify metadata version by the deserialization 
"metadata_version" first property
+   * from the metadata cache file
+   */
+  public static class MetadataVersion {
+@JsonProperty("metadata_version")
+public String textVersion;
+
+/**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
+ */
+enum Versions {
+  v1(Constants.V1),
+  v2(Constants.V2),
+  v3(Constants.V3),
+  v3_1(Constants.V3_1);
+
+  private final String version;
+
+  Versions(String version) {
+this.version = version;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public static Versions fromString(String version) {
+for (Versions v : Versions.values()) {
+  if (v.version.equalsIgnoreCase(version)) {
+return v;
+  }
+}
+return null;
+  }
+
+  public static class Constants {
+public static final String V1 = "v1";
+public static final String V2 = "v2";
+public static final String V3 = "v3";
+public static final String V3_1 = "v3_1";
+  }
+}
+
+/**
+ * @param fs current file system
+ * @param path of metadata cache file
+ * @return true if metadata version is supported, false otherwise
+ * @throws IOException if parquet metadata can't be deserialized from 
the json file
+ */
+public static boolean isVersionSupported(FileSystem fs, Path path) 
throws IOException {
--- End diff --

Not the issue for now, since new deserialization persistence class is 
removed.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated 

[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094997#comment-16094997
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128497802
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -658,11 +657,21 @@ private boolean tableModified(List 
directories, Path metaFilePath,
 return false;
   }
 
+  /**
+   * Basic class for parquet metadata. Inheritors of this class are json 
serializable structures of
+   * different versions metadata cache files.
+   *
+   * Bump up metadata major version if metadata structure is changed.
+   * Bump up metadata minor version if only metadata content is changed, 
but metadata structure is the same.
+   *
+   * Note: keep metadata versions synchronized with {@link 
MetadataVersion.Versions}
+   */
   @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = 
JsonTypeInfo.As.PROPERTY, property = "metadata_version")
   @JsonSubTypes({
-  @JsonSubTypes.Type(value = ParquetTableMetadata_v1.class, name="v1"),
-  @JsonSubTypes.Type(value = ParquetTableMetadata_v2.class, name="v2"),
-  @JsonSubTypes.Type(value = ParquetTableMetadata_v3.class, name="v3")
+  @JsonSubTypes.Type(value = ParquetTableMetadata_v1.class, name = 
MetadataVersion.Versions.Constants.V1),
--- End diff --

There is no case to use `getName()` since annotations attribute value must 
be constant. 
Looks like using just constants without `enum` is more clear. Thanks


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094994#comment-16094994
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/877#discussion_r128495021
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -139,13 +139,13 @@ public static ParquetTableMetadata_v3 
getParquetTableMetadata(FileSystem fs,
* @return
* @throws IOException
*/
-  public static ParquetTableMetadataBase readBlockMeta(FileSystem fs, 
String path, MetadataContext metaContext, ParquetFormatConfig formatConfig) 
throws IOException {
+  public static ParquetTableMetadataBase readBlockMeta(FileSystem fs, Path 
path, MetadataContext metaContext, ParquetFormatConfig formatConfig) throws 
IOException {
--- End diff --

Done.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2017-07-20 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094989#comment-16094989
 ] 

Arina Ielchiieva commented on DRILL-5679:
-

As far as I remember when I have looked at this problem, issue was not in 
scripts but in {{exec}} command. We do have double-quotes everywhere in scripts.
https://github.com/apache/drill/blob/master/distribution/src/resources/runbit#L107

> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2017-07-20 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094955#comment-16094955
 ] 

Paul Rogers commented on DRILL-5679:


Or, we just fix the scripts to allow spaces.

> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-07-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5083:

Reviewer: Paul Rogers

> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman
>Priority: Minor
> Attachments: DrillOperatorErrorHandlingRedesign.pdf
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094827#comment-16094827
 ] 

ASF GitHub Bot commented on DRILL-5083:
---

GitHub user KulykRoman opened a pull request:

https://github.com/apache/drill/pull/881

DRILL-5083: status.getOutcome() return FAILURE if one of the batches …

…has STOP status (to avoid infinite loop in Merge Join).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KulykRoman/drill DRILL-5083

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #881


commit 2db3afd6b5bb8851b8bd938ae66a025c22540cbf
Author: Roman Kulyk 
Date:   2017-07-20T13:33:49Z

DRILL-5083: status.getOutcome() return FAILURE if one of the batches has 
STOP status (to avoid infinite loop in Merge Join).




> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman
>Priority: Minor
> Attachments: DrillOperatorErrorHandlingRedesign.pdf
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2017-07-20 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094646#comment-16094646
 ] 

Arina Ielchiieva commented on DRILL-5679:
-

FYI [~agirish] 

> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2017-07-20 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5679:
---

 Summary: Document JAVA_HOME requirements for installing Drill in 
distributed mode
 Key: DRILL-5679
 URL: https://issues.apache.org/jira/browse/DRILL-5679
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Assignee: Bridget Bevens
 Fix For: 1.11.0


There is general requirement that JAVA_HOME variable should not contain spaces.

For example, during Drill installation in distributed mode on Windows user can 
see the following error:
{noformat}
C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
{noformat}

There are two options to fix this problem:
{noformat}
1. Install JAVA in directory without spaces.
2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
(if in x86).
Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5444) Document missing string functions

2017-07-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5444:

Fix Version/s: 1.11.0

> Document missing string functions
> -
>
> Key: DRILL-5444
> URL: https://issues.apache.org/jira/browse/DRILL-5444
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Khurram Faraaz
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> https://drill.apache.org/docs/string-manipulation/ does not contain full list 
> of Drill string functions.
> For example, reverse, left, right, two of three variations of substr / 
> substring are missing.
> Source - 
> https://github.com/apache/drill/blob/72903d01424139057d4309ce6655e0aecee2573e/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5444) Document missing string functions

2017-07-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5444:
---

Assignee: Bridget Bevens  (was: Khurram Faraaz)

> Document missing string functions
> -
>
> Key: DRILL-5444
> URL: https://issues.apache.org/jira/browse/DRILL-5444
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> https://drill.apache.org/docs/string-manipulation/ does not contain full list 
> of Drill string functions.
> For example, reverse, left, right, two of three variations of substr / 
> substring are missing.
> Source - 
> https://github.com/apache/drill/blob/72903d01424139057d4309ce6655e0aecee2573e/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5666) Add information about UDF naming collision to Drill documentation

2017-07-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5666:

Fix Version/s: 1.11.0

> Add information about UDF naming collision to Drill documentation
> -
>
> Key: DRILL-5666
> URL: https://issues.apache.org/jira/browse/DRILL-5666
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> Add information about UDF naming collision to Drill documentation.
> The below information can be used as model:
> Under duplicated UDF we imply UDF with the same signature (name + input 
> parameters, including input mode, ex: LOWER(VARCHAR-OPTIONAL)).
> If duplicated function is found during drillbit start up, drillbit will fail.
> If we speak about Dynamic UDFs upload then if user tries to register 
> duplicated function, registration will fail and user will see appropriate 
> error message.
> This error message will include duplicated function signature and from where 
> it was registered (built-in or jar name if dynamic).
> Functions can be built-in or dynamic. When function is built-in, it can be 
> registered from two places, from drill-java-exec jar or from custom jar (if 
> it is placed into the classpath, usually into jars folder).
> If function is built-in and was registered from drill-java-exec jar, you 
> cannot replace it with your own, unless you change drill-java-exec jar (this 
> will mean you'll have your own Drill version).
> If function is built-in and was registered from custom jar in classpath, you 
> can remove / replace custom jar and restart drillbit.
> If function is dynamic, you can use DROP FUNCTION command and then register 
> new function with the same signature. 
> Link to Drill documentation - 
> https://drill.apache.org/docs/develop-custom-functions/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5610) Update Drill Team page with committers statuses

2017-07-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5610:

Fix Version/s: 1.11.0

> Update Drill Team page with committers statuses
> ---
>
> Key: DRILL-5610
> URL: https://issues.apache.org/jira/browse/DRILL-5610
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>  Labels: doc-impacting
> Fix For: 1.11.0
>
>
> It would be helpful if Team page will include information about who among 
> committers is PMC Chair, PMC, regular committer.
> Link to Team page - https://drill.apache.org/team/ 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4264) Dots in identifier are not escaped correctly

2017-07-20 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094260#comment-16094260
 ] 

Paul Rogers commented on DRILL-4264:


On the planner side, we represent field references with the {{FieldReference}} 
class you mentioned. {{FieldReference}} extends {{SchemaPath}}. These classes 
break names down into one object per name part.

Assume we have {{SELECT a.b.c, a."d.e" ...}}

Within the {{FieldReference}} itself, we hold onto the name using a 
{{PathSegment}} which has two subclasses: {{ArraySegment}} and {{NameSegment}}. 
So, as you noted, in the planner, we can tell the difference between the two 
cases (using functional notation):

{code}
a.b.c: FieldReference(NameSegment("a", NameSegment("b", NameSegment("c"
a."d.e": FieldReference(NameSegment("a", NameSegment("d.e")))
{code}

So far so good. Bug, {{SchemaPath}} provides the {{getAsUnescapedPath()}} 
method which concatenates the parts of the name using dots. We end up with two 
{{FieldReference}} instances. Calling {{getAsUnescapedPath()}} on each produces 
{{a.b.c}} and {{a.d.e}}. So, if anything actually uses this unescaped path, we 
end up with an ambiguity: does "a.d.e" represent one field, two fields or three 
fields? We cannot tell.

Now, if this method was only used for debugging (line {{toString()}}), it would 
be fine. But, in fact, many operators refer to this method, especially when 
creating the run-time representation of a field schema: {{MaterializedField}}:

>From {{StreamingAggBatch}}:

{code}
  final MaterializedField outputField = MaterializedField.create(
ne.getRef().getAsUnescapedPath(), expr.getMajorType());
{code}

In our examples, we end up with two materialized fields: one called "a.b.c", 
the other "a.d.e", so the ambiguity persists.

As it turns out, each {{MaterializedField}} represents one field or structured 
column. So, our map "a" is represented by a {{MaterializedField}}, "b" by 
another, "c" by yet another and "d.e" by another. So, each should correspond to 
a single name part.

But, the code doesn't work that way, it actually builds up the full unescaped 
name.

Now, I suspect that the code here is old and inconsistent. It should be that 
creating a materialized field pulls out only one name part. But, the code 
actually concatenates. My suspicion increases when I see methods like these in 
{{MaterializedField}}:

{code}
  public String getPath() { return getName(); }
  public String getLastName() { return getName(); }
  public String getName() { return name; }
{code}

That first one really worries me: it is asking for the "path", which means 
dotted name. There are many references to this name. Does this mean the code 
expects to get a string (not a {{NameSegment}}) that holds the composite name. 
If so, we are in trouble.

Now, as it turns out, it seems that the "modern" form of {{MaterializedSchema}} 
is that each hold just one name part. So:

{code}
MaterializedField(name="a", children = (
  MaterializedField(name="b", children = (
MaterializedField(name = c))),
  MaterializedField(name="d.e")))
{code}

I wonder, because the code appears to be written assuming that a 
{{MaterializedField}} had a path name, does any code still rely on this fact, 
then split the name at dots to get fields?

If not, can we remove the {{getPath()}}, and {{getLastPath()}} methods to make 
it clear that each {{MaterializedField}} corresponds to a single 
{{NameSegment}}?

And, if we do that, should we remove all calls to 
{{NameSegment.getAsUnescapedPath()}} to make clear that we never (except for 
display) want dotted, combined path name?

By carefully looking at the above issues, we can be sure that no old code in 
Drill tries to concatenate "a" and "d.e" to get the name "a.d.e" which it then 
splits into "a", "d" and "e".

A quick search for ".split(" found a number of places where we split names on a 
dot, including in the Parquet Metadata file:

{code}
public Object deserializeKey(String key, 
com.fasterxml.jackson.databind.DeserializationContext ctxt)
throws IOException, 
com.fasterxml.jackson.core.JsonProcessingException {
  return new Key(key.split("\\."));
}
{code}

Are there others? Do these need to be fixed?

> Dots in identifier are not escaped correctly
> 
>
> Key: DRILL-4264
> URL: https://issues.apache.org/jira/browse/DRILL-4264
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Alex
>Assignee: Volodymyr Vysotskyi
>
> If you have some json data like this...
> {code:javascript}
> {
>   "0.0.1":{
> "version":"0.0.1",
> "date_created":"2014-03-15"
>   },
>   "0.1.2":{
> "version":"0.1.2",
> "date_created":"2014-05-21"
>   }
> }
>