Re: Maven build failing on checkstyle

2015-09-10 Thread Edmon Begoli
Long story short (sorry :-|) - does it make sense to have a build stopping check for \s+$ in a javadoc and not check and stop the build for missing or improper javadoc? On Thursday, September 10, 2015, Edmon Begoli wrote: > My observation on this, as a newcomer, is that it is

Re: Maven build failing on checkstyle

2015-09-10 Thread Jacques Nadeau
Hey Edmon, I completely agree that Checkstyle can be a pain. I've worked on a couple projects where the rules are truly draconian. In Drill today, I think we have only the following rules: 1) No trailing whitespace 2) No If statements without brackets (other than ternary) 3) No imports of the

Re: Maven build failing on checkstyle

2015-09-10 Thread Chris Westin
Since most editors and IDEs will strip trailing whitespace everywhere in a file (code or comments), we should leave it in to avoid getting spurious diffs. On Thu, Sep 10, 2015 at 7:02 AM, Jacques Nadeau wrote: > Hey Edmon, > > I completely agree that Checkstyle can be a

Re: Maven build failing on checkstyle

2015-09-10 Thread Edmon Begoli
My opinion is to keep trailing spaces checks in as they are. On Thursday, September 10, 2015, Jacques Nadeau wrote: > Hey Edmon, > > I completely agree that Checkstyle can be a pain. I've worked on a couple > projects where the rules are truly draconian. In Drill today, I

Re: Maven build failing on checkstyle

2015-09-10 Thread Ted Dunning
On Thu, Sep 10, 2015 at 7:02 AM, Jacques Nadeau wrote: > I think that the trailing > whitespace check does provide great use in code so I'd prefer to keep it. > > What do you think? > I think that we should have a rule that every class should have javadoc on the class.

[jira] [Resolved] (DRILL-3746) Hive query fails if the table contains external partitions

2015-09-10 Thread Venki Korukanti (JIRA)
[ https://issues.apache.org/jira/browse/DRILL-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3746. Resolution: Fixed > Hive query fails if the table contains external partitions >

[GitHub] drill pull request: DRILL-3746: Get Hive partition values from Met...

2015-09-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/151#discussion_r39176878 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java --- @@ -62,7 +62,7 @@ public

[GitHub] drill pull request: DRILL-3746: Get Hive partition values from Met...

2015-09-10 Thread vkorukanti
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/151#discussion_r39177644 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java --- @@ -62,7 +62,7 @@ public

[GitHub] drill pull request: DRILL-3746: Get Hive partition values from Met...

2015-09-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/151 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Created] (DRILL-3758) InvalidRecorException while selecting from a table with multiple parquet file

2015-09-10 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3758: --- Summary: InvalidRecorException while selecting from a table with multiple parquet file Key: DRILL-3758 URL: https://issues.apache.org/jira/browse/DRILL-3758

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39194893 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentManager.java --- @@ -38,36 +37,43 @@ * @return True if the

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/152#discussion_r39192041 --- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java --- @@ -379,6 +379,16 @@ protected static void parseErrorHelper(final String

Re: In list filter evaluation : room for improvement in run-time code generation.

2015-09-10 Thread Jinfeng Ni
I think Semi-join is not valid in this case, since the original query has 5 in-lists ORed together. If Semi-join is used, then the rows that does not qualify for the first 1 in-list filter would be pruned out, which is not valid, since they may qualify for the second in-list filter. That's why

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39193558 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentManager.java --- @@ -38,36 +37,43 @@ * @return True if the

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39193879 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentManager.java --- @@ -38,36 +37,43 @@ * @return True if the

Re: In list filter evaluation : room for improvement in run-time code generation.

2015-09-10 Thread Hsuan Yi Chu
I believe the usage of Semi-Join had been proposed before. Would that new operator help in this scenario you think? On Wed, Sep 9, 2015 at 8:16 PM, Jinfeng Ni wrote: > The reason that the in-list join approach is not fast enough : > the query has 5 in-lists ORed

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/152#discussion_r39189463 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java --- @@ -165,49 +258,182 @@ public void testWindowGroupByOnView()

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread jinfengni
Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/152#issuecomment-139324948 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[jira] [Resolved] (DRILL-3645) typo in drill documentation

2015-09-10 Thread Kristine Hahn (JIRA)
[ https://issues.apache.org/jira/browse/DRILL-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kristine Hahn resolved DRILL-3645. -- Resolution: Fixed Fix Version/s: 1.2.0 Fix has been published > typo in drill

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread hsuanyi
Github user hsuanyi commented on the pull request: https://github.com/apache/drill/pull/152#issuecomment-139341626 Will report after done with the tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread hsuanyi
Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/152#discussion_r39198057 --- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java --- @@ -379,6 +379,16 @@ protected static void parseErrorHelper(final String

[jira] [Created] (DRILL-3759) Make partition pruning multi-phased to reduce the working set kept in memory

2015-09-10 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-3759: - Summary: Make partition pruning multi-phased to reduce the working set kept in memory Key: DRILL-3759 URL: https://issues.apache.org/jira/browse/DRILL-3759 Project: Apache

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread cwestin
Github user cwestin commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39197169 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java --- @@ -211,8 +220,11 @@ public TransferPair makeTransferPair(ValueVector

[DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

2015-09-10 Thread Khurram Faraaz
Hi, Querying nested data from a JSON file with and with out setting store.json.all_text_mode results in errors. Data that was used in the test is available here - https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5 There is an option (on the top right) to Export as

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread hsuanyi
Github user hsuanyi commented on the pull request: https://github.com/apache/drill/pull/152#issuecomment-139341107 Addressed jinfengni and sudheeshkatkam 's comments, Thanks!!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[jira] [Created] (DRILL-3764) Support the ability to identify and/or skip records when a function evaluation fails

2015-09-10 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-3764: - Summary: Support the ability to identify and/or skip records when a function evaluation fails Key: DRILL-3764 URL: https://issues.apache.org/jira/browse/DRILL-3764

Re: [DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

2015-09-10 Thread Abdel Hakim Deneche
It's interesting that the following query also fails: SELECT COUNT(meta) FROM `rows.json` the schema change is nested inside meta.view, so even though we are counting how many meta "groups" we have, Drill still reads and materializes all the content of the meta field. On Thu, Sep 10, 2015 at

Re: Directory and file based partition pruning

2015-09-10 Thread Jinfeng Ni
Seems to me one important reason we hit out of heap memory for partition prune rule is that the rule itself is invoked multiple times, even the filter has been pushed into scan in the first call. I tried with a simple unit test TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is

[jira] [Created] (DRILL-3765) Partition prune rule is unnecessary fired multiple times.

2015-09-10 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-3765: - Summary: Partition prune rule is unnecessary fired multiple times. Key: DRILL-3765 URL: https://issues.apache.org/jira/browse/DRILL-3765 Project: Apache Drill

Re: Directory and file based partition pruning

2015-09-10 Thread Aman Sinha
Yes, it is a good point about multiple invocations of the PruneScan rule. The other point about using Java heap is not correct. The rule does off-heap allocation using memory buffer from QueryContext and in the finally block releases the memory. Aman On Thu, Sep 10, 2015 at 6:18 PM, Jinfeng Ni

Re: In list filter evaluation : room for improvement in run-time code generation.

2015-09-10 Thread Jacques Nadeau
I haven't looked at your patch yet. Do you try to address the issue where we do redundant retrieval for all situations or only for some. Seems like it should be managed at the class generator level since it could have this context. Also note that you probably would see a substantial benefit if

[jira] [Created] (DRILL-3761) CastIntDecimal implementation should not update the input holder.

2015-09-10 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-3761: - Summary: CastIntDecimal implementation should not update the input holder. Key: DRILL-3761 URL: https://issues.apache.org/jira/browse/DRILL-3761 Project: Apache Drill

Re: Directory and file based partition pruning

2015-09-10 Thread Jinfeng Ni
I opened DRILL-3765 for the multiple rule execution issue: https://issues.apache.org/jira/browse/DRILL-3765 On Thu, Sep 10, 2015 at 5:34 PM, Jinfeng Ni wrote: > Seems to me one important reason we hit out of heap memory for partition > prune rule is that the rule itself

[jira] [Created] (DRILL-3762) NPE : Query nested JSON data

2015-09-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3762: - Summary: NPE : Query nested JSON data Key: DRILL-3762 URL: https://issues.apache.org/jira/browse/DRILL-3762 Project: Apache Drill Issue Type: Bug

[GitHub] drill pull request: Drill 3754: Remove redundancy in run-time gene...

2015-09-10 Thread jinfengni
GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/153 Drill 3754: Remove redundancy in run-time generated code for common column references Test done: unit test pre-commit test. You can merge this pull request into a Git repository

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] drill pull request: DRILL-1942-concurrency-test: new smoke test fo...

2015-09-10 Thread cwestin
Github user cwestin commented on the pull request: https://github.com/apache/drill/pull/105#issuecomment-139409971 Addressed Sudheesh's comments. Unless Jason has anything else, can we please get this merged now? --- If your project is set up for it, you can reply to this email and

Re: Directory and file based partition pruning

2015-09-10 Thread Aman Sinha
Agree on the N phased approach. I have filed a JIRA for the enhancement: DRILL-3759. Regarding the simplification of the expression tree logic..did you mean the logic in FindPartitionConditions or the Interpreter ? Perhaps you can add comments in the JIRA with some explanation. I am in favor

[jira] [Created] (DRILL-3763) Cancel (Ctrl-C) one of concurrent queries results in ChannelClosedException

2015-09-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3763: - Summary: Cancel (Ctrl-C) one of concurrent queries results in ChannelClosedException Key: DRILL-3763 URL: https://issues.apache.org/jira/browse/DRILL-3763 Project:

Re: Maven build failing on checkstyle

2015-09-10 Thread Daniel Barclay
Ted Dunning wrote: ... I think that we should have a rule that every class should have javadoc on the class. Since it will take a long time to get to that state, we should probably start with whichever are the most important classes to document. That fuzzy set of "most important" classes

Re: Directory and file based partition pruning

2015-09-10 Thread Jinfeng Ni
I got the impression of Java heap memory because one customer complained about running into out of heap memory, when they are dealing with pruning large number of files. Is it possible that the rule put the value vector in the direct memory, but also uses object reference which is proportional to

[GitHub] drill pull request: Update Calcite and Add Test cases

2015-09-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/152 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

Re: Maven build failing on checkstyle

2015-09-10 Thread Edmon Begoli
I will open an issue and work on this proposed javadoc. I am a "first principles" type of person, and I do not see myself being able to effectively do stuff until I understand the Drill API inside-out myself. A result of the learning process should be contributed back to the project. I'll start

Can this be merged in - DRILL-3724?

2015-09-10 Thread Edmon Begoli
It is an easy fix for javadoc, and it has been approved by Jacques: https://github.com/apache/drill/pull/139 Thank you, Edmon

Re: In list filter evaluation : room for improvement in run-time code generation.

2015-09-10 Thread Hsuan Yi Chu
Can we cartesian-join all the values in the in list and rewrite it as a single in list: For example, Say, the original where-clause is "a in (1, 2) or b in (3, 4)" Can we implement a rule to let calcite treat it as "(a, b) in ((1,3),(1,4),(2,3),(2,4))"

Re: In list filter evaluation : room for improvement in run-time code generation.

2015-09-10 Thread Hsuan Yi Chu
Wait... that transformation works for AND but not for OR... On Thu, Sep 10, 2015 at 1:12 PM, Hsuan Yi Chu wrote: > Can we cartesian-join all the values in the in list and rewrite it as a > single in list: > > For example, > Say, the original where-clause is > > "a in (1, 2)

[GitHub] drill pull request: DRILL-1942-concurrency-test: new smoke test fo...

2015-09-10 Thread cwestin
Github user cwestin commented on the pull request: https://github.com/apache/drill/pull/105#issuecomment-139377475 I added code to collect errors, as well as to interrupt the test thread when an error occurs. --- If your project is set up for it, you can reply to this email and have

[jira] [Created] (DRILL-3760) Casting interval to string and back to interval fails

2015-09-10 Thread Daniel Barclay (Drill) (JIRA)
Daniel Barclay (Drill) created DRILL-3760: - Summary: Casting interval to string and back to interval fails Key: DRILL-3760 URL: https://issues.apache.org/jira/browse/DRILL-3760 Project: Apache

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39214700 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java --- @@ -211,8 +220,11 @@ public TransferPair makeTransferPair(ValueVector

Re: [DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

2015-09-10 Thread Jason Altekruse
This is a known issue, we have been describing this schema change scenario as a change in "data shape". Here is a very simple dataset that will produce the same error { "a" : 1 } { "a" : { "b" : 3} } We currently only use all_text_mode to change the type of the data at a leaf in the schema.

[GitHub] drill pull request: DRILL-1942-concurrency-test: new smoke test fo...

2015-09-10 Thread cwestin
Github user cwestin commented on a diff in the pull request: https://github.com/apache/drill/pull/105#discussion_r39216415 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestTpchDistributedConcurrent.java --- @@ -0,0 +1,177 @@ +/** + * Licensed to the Apache

[GitHub] drill pull request: DRILL-1942-hygiene

2015-09-10 Thread cwestin
Github user cwestin commented on a diff in the pull request: https://github.com/apache/drill/pull/120#discussion_r39196959 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentManager.java --- @@ -38,36 +37,43 @@ * @return True if the