[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-11-14 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/976
  
@dprofeta will you be able to address the issues before the release?


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-11-01 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/976
  
Drill follows SQL rules and is case insensitive. If case sensitivity has 
snuck in somewhere (perhaps due to the use of `equals()` rather than 
`equalsIgnorCase()` or the use of a case-sensitive map), then we should fix 
that.

Note also that column aliases should not be visible to the Parquet reader.


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-11-01 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/976
  
Looking at the stack trace:
- The code definitely is initializing a column of type REPEATABLE
- The Fast Reader didn't expect this scenario so it used a default 
container (NullableVarBinary) for VL binary DT

Why this is happening?
- The code in ReadState::buildReader() is processing all selected columns
- This information is obtained from the ParquetSchema
- Looking at the code, this seems a case-sensitivity issue
- The ParquetSchema is case-insensitive whereas the Parquet GroupType is not
- Damien added a catch handler (column not found) to handle use-cases where 
we are projecting non-existing columns
- This basically is leading to an unforeseen use-case
- Assume column XYZ is complex
- User uses an alias (xyz)
- The new code will allow this column to pass and treat is as simple
- The ParquetSchema is being case insensitive will process this column
- and thus the exception in the test suite

Suggested Fix
- Create a map (key to-lower-case) and register all current row-group 
columns
- Use this map to locate a selected column type



---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-30 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/976
  
@dprofeta, tried to commit this PR, but ran into multiple functional test 
failures:

```
Execution Failures:

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex12.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex8.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex56.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex274.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex7.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex57.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex102.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex5.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex10.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex9.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex203.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex101.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex275.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex6.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex205.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex11.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex58.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex153.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex202.q

/root/drillAutomation/mapr/framework/resources/Functional/complex/parquet/complex151.q
```

The common failure stack trace seems to be:

```

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException():272

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup():256
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():241
org.apache.drill.exec.physical.impl.ScanBatch.next():167
...
``` 


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-29 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/976
  
+1
looks good!


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-29 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/976
  
@sachouche can you please take a final look? If it looks good, maybe one of 
the committers can include this for the 1.12 release. @arina-ielchiieva ?


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-17 Thread dprofeta
Github user dprofeta commented on the issue:

https://github.com/apache/drill/pull/976
  
I updated the javadoc with Paul remarks.


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-16 Thread dprofeta
Github user dprofeta commented on the issue:

https://github.com/apache/drill/pull/976
  
here is the updated PR.
Yes, I also wanted to add group without repetition. It is only a matter of 
naming so it should not be hard but when I tested, the fast reader was not able 
to handle it.


---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-06 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/976
  
Sure!


Regards,

Salim


From: dprofeta 
Sent: Friday, October 6, 2017 8:52:51 AM
To: apache/drill
Cc: Salim Achouche; Mention
Subject: Re: [apache/drill] DRILL-5797: Choose parquet reader from read 
columns (#976)


@sachouche Can you review it?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on 
GitHub, or 
mute the 
thread.



---


[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns

2017-10-06 Thread dprofeta
Github user dprofeta commented on the issue:

https://github.com/apache/drill/pull/976
  
@sachouche Can you review it?


---