Re: Time for a 1.9 Release?

2016-10-28 Thread Parth Chandra
+1 on doing a release. I'm hoping the following get a +1 : DRILL-4800 - Improve parquet reader performance DRILL-3423 - Add New HTTPD format plugin DRILL-4858 - REPEATED_COUNT on JSON containing an array of maps Specifically what did you want to discuss about the release number after 1.9?

[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-28 Thread laurentgo
Github user laurentgo commented on the issue: https://github.com/apache/drill/pull/602 Branch updated with only commits associated with a JIRA (everything else has been merged into commit for DRILL-4420) --- If your project is set up for it, you can reply to this email and have your

[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-28 Thread laurentgo
Github user laurentgo commented on the issue: https://github.com/apache/drill/pull/602 yes, let me clean my branch one last time by squashing the small commits with no JIRA. For the windows build, I already added instructions regarding the need for PowerShell on the system, and on

Re: Time for a 1.9 Release?

2016-10-28 Thread Sudheesh Katkam
Let's aim for EOD next Friday (11/04/16) to get all changes in; I will try to get RC0 out on Monday (11/07/16). Current list of commits: [Sudheesh] + DRILL-4280: pull request being reviewed https://github.com/apache/drill/pull/578 [Jinfeng] + DRILL-1950: pull request pending Any other pull

[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-28 Thread parthchandra
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/602 +1. Everything looks good. I'm assuming that you will squash the commits which don't have an associated Jira. Also, if you can add any notes needed to the Windows build instructions, that

[GitHub] drill pull request #635: DRILL-4927 (part 2): Add support for Null Equality ...

2016-10-28 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/635#discussion_r85625856 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java --- @@ -493,4 +496,81 @@ public void withNullEqualAdditionFilter() throws

[jira] [Created] (DRILL-4981) TPC-DS Query 75 fails on MapR-DB JSON Tables

2016-10-28 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-4981: -- Summary: TPC-DS Query 75 fails on MapR-DB JSON Tables Key: DRILL-4981 URL: https://issues.apache.org/jira/browse/DRILL-4981 Project: Apache Drill Issue

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jason Altekruse
The only worry I have about declaring a writer version is possible confusion with the Parquet format version itself. The format is already defined through version 2.1 or something like that, but we are currently only writing files based on the 1.x version of the format. My preferred solution to

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jinfeng Ni
Vitalli, Just to confirm, you will "remove" isDateCorrect flag, and use parquet-writer version in stead, correct? On Fri, Oct 28, 2016 at 2:52 PM, Vitalii Diravka wrote: > Jinfeng, > > isDateCorrect will be false in the code when isDateCorrect property is > absent

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Vitalii Diravka
Jinfeng, isDateCorrect will be false in the code when isDateCorrect property is absent in the parquet metadata. Anyway I am going to implement the mentioned approach with the parquet-writer.version instead of isDateCorrect property.

[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-28 Thread laurentgo
Github user laurentgo commented on the issue: https://github.com/apache/drill/pull/602 sounds good, I think all of the comments I received have been addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

Re: to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Veera Naranammalpuram
I would expect it to work. You should just reopen DRILL-3214. You have already created one for this. -Veera On Fri, Oct 28, 2016 at 3:08 PM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > You want to use MM for month and not mm for minute as imm can produce the > wornd result. > >

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Paul Rogers
I like the proposal. The Parquet Writer version should just be 2 (no .0.0 as we won’t have major or minor versions.) With things like writer versions (or RPC versions, etc.) the usual rule is to use increasing integers. I am surprised that the other tools don’t include more detail about the

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jinfeng Ni
Thanks for the explanation, Jason. The three different values for DateCorruptionStatus make sense to me. The isDataCorrect flag = true, means that the values are known to be correct. The isDataCorrect flag = false, means that the values are know to be incorrect, or unclear? On Fri, Oct 28,

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Vitalii Diravka
I explored metadata of parquet files generated from different tools: * Impala:* creator: impala version 2.2.0-cdh5.4.5 (build 4a81c1d04c39961ef14ff6131d543dd96ef60e6e) *Hive:* creator: parquet-mr version 1.6.0 *Pig:* creator: parquet-mr version 1.5.1-SNAPSHOT extra:

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jason Altekruse
The isDataCorrect flag means that the values are known to be correct, and there is no need to auto-detect corruption or correct anything. META_SHOWS_CORRUPTION can be set either when we have a known old version of Drill written in the metadata, or we have older files that might have been written

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jinfeng Ni
Hi Vitalli, DateCorruptionStatus has three possibilities: META_SHOWS_CORRUPTION, META_SHOWS_NO_CORRUPTION, META_UNCLEAR_TEST_VALUES. What value will this isDateCorrect flag have for each possiblity, especially for META_UNCLEAR_TEST_VALUES? Are DateCorruptionStatus and isDateCorrect same things,

Re: to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Andries Engelbrecht
You want to use MM for month and not mm for minute as imm can produce the wornd result. Probably best to file an enhancement JIRA to have the function handle empty fields and produce a null value. Then the wider audience can review the merit for implementation. --Andries > On Oct 28, 2016,

Re: Time for a 1.9 Release?

2016-10-28 Thread Jinfeng Ni
+1 I'm working on DRILL-1950 to support parquet row group level filter pruning. I plan to submit a pull request for code review in 1-2 days, hopefully. On Fri, Oct 28, 2016 at 11:04 AM, Aman Sinha wrote: > +1 > > On Fri, Oct 28, 2016 at 10:34 AM, Sudheesh Katkam

[jira] [Resolved] (DRILL-4968) Add column size information to ColumnMetadata

2016-10-28 Thread Sudheesh Katkam (JIRA)
[ https://issues.apache.org/jira/browse/DRILL-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-4968. Resolution: Fixed Fix Version/s: 1.9.0 Fixed in

[GitHub] drill pull request #635: DRILL-4927 (part 2): Add support for Null Equality ...

2016-10-28 Thread KulykRoman
GitHub user KulykRoman opened a pull request: https://github.com/apache/drill/pull/635 DRILL-4927 (part 2): Add support for Null Equality Joins (mixed compa… …rators) This changes are a subset of the original pull request from DRILL-4539 (PR-462). - Added changes

Re: Time for a 1.9 Release?

2016-10-28 Thread Aman Sinha
+1 On Fri, Oct 28, 2016 at 10:34 AM, Sudheesh Katkam wrote: > Hi Drillers, > > We have a reasonable number of fixes and features since the last release > [1]. Releasing itself takes a while; so I propose we start the 1.9 release > process. > > I volunteer as the release

[GitHub] drill issue #629: DRILL-4967: Adding template_name to source code generated ...

2016-10-28 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/629 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] drill issue #633: DRILL-4972: Remove setDaemon(true) call in WorkManager.Sta...

2016-10-28 Thread parthchandra
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/633 +1. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

Re: ZK lost connectivity issue on large cluster

2016-10-28 Thread Padma Penumarthy
Hi Francois, Thank you for the picture and the info you provided. We will keep you updated and let you know when we make changes in future release. Thanks, Padma > On Oct 26, 2016, at 6:06 PM, François Méthot wrote: > > Hi, > > Sorry it took so long, lost the origin

[GitHub] drill issue #628: DRILL-4964: Drill fails to connect to hive metastore after...

2016-10-28 Thread sohami
Github user sohami commented on the issue: https://github.com/apache/drill/pull/628 @jinfengni - Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] drill pull request #628: DRILL-4964: Drill fails to connect to hive metastor...

2016-10-28 Thread sohami
Github user sohami closed the pull request at: https://github.com/apache/drill/pull/628 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] drill issue #628: DRILL-4964: Drill fails to connect to hive metastore after...

2016-10-28 Thread jinfengni
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/628 +1 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Paul Rogers
Thanks Vitalii. The Parquet Writer solution “just works”. As soon as someone upgrades the writer, files are labeled as having that new version. No fuzziness during a release as in 1.9. It is fine to also include the Drill version. But, format decisions should be keyed off of the writer

Re: TO_TIMESTAMP function returns in-correct results

2016-10-28 Thread Khurram Faraaz
Works, thanks! 0: jdbc:drill:schema=dfs.tmp> VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10 UTC', '-MM-dd HH:mm:ss.SSS z')); ++ | EXPR$0 | ++ | 2015-03-30 20:49:59.1 | ++ 1 row selected (0.245 seconds) On

Re: to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Khurram Faraaz
Thanks Andries and Veera. 1. Yes, my CSV file does have empty strings in some rows in columns[4]. 2. it worked for parquet because I had used the case expression to cast empty strings to NULL. 3. I tried with '-mm-dd' and '-MM-dd' and to_Date returned results with both representations.

[GitHub] drill pull request #633: DRILL-4972: Set WorkManager.StatusThread's daemon f...

2016-10-28 Thread sohami
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/633#discussion_r85557510 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java --- @@ -295,7 +295,7 @@ public FragmentExecutor getFragmentRunner(final

Re: to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Andries Engelbrecht
Good catch on empty string Veera! Wouldn't it be cheaper to check for an empty string? case when columns[] ='' then null else to_date(columns[],'-MM-dd') end I don't think the option to read csv empty columns (or empty string in any text reader) as null is in the reader yet. So we can't

Re: to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Veera Naranammalpuram
Do you have zero length strings in your data? I have seen cases where the system option to cast empty strings to NULL doesn't work as advertised. You should re-open DRILL-3214. When I run into this problem, I usually use a regex to workaround. The PROJECT takes a performance hit when you do this

Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Vitalii Diravka
I agree that it would be good if the approach of parquet date correctness detection will be upgraded. So I created the jira for it DRILL-4980 . But now we have two ideas: 1. To add checking of the drill version additionally, so later we can delete

[GitHub] drill pull request #633: DRILL-4972: Set WorkManager.StatusThread's daemon f...

2016-10-28 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/633#discussion_r85552294 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java --- @@ -295,7 +295,7 @@ public FragmentExecutor

[jira] [Created] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-10-28 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4980: -- Summary: Upgrading of the approach of parquet date correctness status detection Key: DRILL-4980 URL: https://issues.apache.org/jira/browse/DRILL-4980 Project:

Re: TO_TIMESTAMP function returns in-correct results

2016-10-28 Thread Serhii Harnyk
Example of the usage with "s" for second of minute and "S" for fraction of second: VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10 UTC', '-MM-dd HH:mm:ss.SSS z')) 2016-10-28 16:17 GMT+03:00 Khurram Faraaz : > Thanks Serhii. > > Can you please give me a working example of

to_date(csv-columns[x],'yyyy-mm-dd') - IllegalArgumentException

2016-10-28 Thread Khurram Faraaz
All, Question is - why does it work for a parquet column and fails when CSV column is used ? Drill 1.9.0 commit : a29f1e29 This is a simple project of column from a csv file, works. {noformat} 0: jdbc:drill:schema=dfs.tmp> select columns[4] FROM `typeall_l.csv` t1 limit 5; +-+ |

Re: TO_TIMESTAMP function returns in-correct results

2016-10-28 Thread Khurram Faraaz
Thanks Serhii. Can you please give me a working example of the usage with "s" for second of minute and "S" for fraction of second. I tried with both those symbols, however Drill 1.9.0 (commit: a29f1e29) does not honor those symbols when used from within the to_date function. On Thu, Oct 27,

[jira] [Created] (DRILL-4979) Make dataport configurable

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4979: -- Summary: Make dataport configurable Key: DRILL-4979 URL: https://issues.apache.org/jira/browse/DRILL-4979 Project: Apache Drill Issue Type: New Feature

[jira] [Created] (DRILL-4978) Parquet metadata cache on S3 is always renewed

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4978: -- Summary: Parquet metadata cache on S3 is always renewed Key: DRILL-4978 URL: https://issues.apache.org/jira/browse/DRILL-4978 Project: Apache Drill Issue Type:

[jira] [Created] (DRILL-4977) Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4977: -- Summary: Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests Key: DRILL-4977 URL:

[jira] [Created] (DRILL-4976) Querying Parquet files on S3 pulls

2016-10-28 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created DRILL-4976: -- Summary: Querying Parquet files on S3 pulls Key: DRILL-4976 URL: https://issues.apache.org/jira/browse/DRILL-4976 Project: Apache Drill Issue Type: Improvement