Re: Single Hdfs block per parquet file

2017-03-22 Thread Padma Penumarthy
Yes, seems like it is possible to create files with different block sizes. We could potentially pass the configured store.parquet.block-size to the create call. I will try it out and see. will let you know. Thanks, Padma > On Mar 22, 2017, at 4:16 PM, François Méthot

[GitHub] drill pull request #793: DRILL-4678: Tune metadata by generating a dispatche...

2017-03-22 Thread jinfengni
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/793#discussion_r107561519 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdRowCount.java --- @@ -14,35 +14,71 @@ * WITHOUT WARRANTIES OR

[jira] [Resolved] (DRILL-5001) Join only supports implicit casts error even when I have explicit cast

2017-03-22 Thread Rahul Challapalli (JIRA)
[ https://issues.apache.org/jira/browse/DRILL-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli resolved DRILL-5001. -- Resolution: Not A Bug Ok...this is not a bug. The underlying parquet data actually

Re: Single Hdfs block per parquet file

2017-03-22 Thread François Méthot
Here are 2 links I could find: http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)

[GitHub] drill pull request #793: DRILL-4678: Tune metadata by generating a dispatche...

2017-03-22 Thread Serhii-Harnyk
GitHub user Serhii-Harnyk opened a pull request: https://github.com/apache/drill/pull/793 DRILL-4678: Tune metadata by generating a dispatcher at runtime Changes for rebasing to Calcite 1.4.0-drill-r20 You can merge this pull request into a Git repository by running: $ git

Re: Single Hdfs block per parquet file

2017-03-22 Thread Padma Penumarthy
I think we create one file for each parquet block. If underlying HDFS block size is 128 MB and parquet block size is > 128MB, it will create more blocks on HDFS. Can you let me know what is the HDFS API that would allow you to do otherwise ? Thanks, Padma > On Mar 22, 2017, at 11:54 AM,

Single Hdfs block per parquet file

2017-03-22 Thread François Méthot
Hi, Is there a way to force Drill to store CTAS generated parquet file as a single block when using HDFS? Java HDFS API allows to do that, files could be created with the Parquet block-size. We are using Drill on hdfs configured with block size of 128MB. Changing this size is not an option at

[jira] [Created] (DRILL-5376) Rationalize Drill's row structure for simpler code, better performance

2017-03-22 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5376: -- Summary: Rationalize Drill's row structure for simpler code, better performance Key: DRILL-5376 URL: https://issues.apache.org/jira/browse/DRILL-5376 Project: Apache

[GitHub] drill pull request #792: DRILL-4971: query encounters system error: Statemen...

2017-03-22 Thread vdiravka
GitHub user vdiravka opened a pull request: https://github.com/apache/drill/pull/792 DRILL-4971: query encounters system error: Statement "break AndOP3" i… …s not enclosed by a breakable statement with label "AndOP3" - New evaluated blocks for boolean operators should

[jira] [Created] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-22 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5375: --- Summary: Nested loop join: return correct result for left join Key: DRILL-5375 URL: https://issues.apache.org/jira/browse/DRILL-5375 Project: Apache Drill

Is it possible to delegate data joins and filtering to the datasource ?

2017-03-22 Thread Muhammad Gelbana
I'm trying to use Drill with a proprietary datasource that is very fast in applying data joins (i.e. SQL joins) and query filters (i.e. SQL where conditions). To connect to that datasource, I first have to write a storage plugin, but I'm not sure if my main goal is applicable. May main goal is