date:20190411

[jira] [Created] (DRILL-7173) Analyze table may fail when prefer_plain_java is set to true on codegen for resetValues

2019-04-11 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7173:
---

 Summary: Analyze table may fail when prefer_plain_java is set to 
true on codegen for resetValues 
 Key: DRILL-7173
 URL: https://issues.apache.org/jira/browse/DRILL-7173
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Affects Versions: 1.15.0
 Environment: *prefer_plain_java: true*

 
Reporter: Boaz Ben-Zvi
 Fix For: 1.17.0


  The *prefer_plain_java* compile option is useful for debugging of generated 
code (can be set in dril-override.conf; the default value is false). When set 
to true, some "analyze table" calls generate code that fails due to addition of 
a SchemaChangeException which is not in the Streaming Aggr template.

For example:
{noformat}
apache drill (dfs.tmp)> create table lineitem3 as select * from 
cp.`tpch/lineitem.parquet`;
+--+---+
| Fragment | Number of records written |
+--+---+
| 0_0 | 60175 |
+--+---+
1 row selected (2.06 seconds)
apache drill (dfs.tmp)> analyze table lineitem3 compute statistics;
Error: SYSTEM ERROR: CompileException: File 
'org.apache.drill.exec.compile.DrillJavaFileObject[StreamingAggregatorGen4.java]',
 Line 7869, Column 20: StreamingAggregatorGen4.java:7869: error: resetValues() 
in org.apache.drill.exec.test.generated.StreamingAggregatorGen4 cannot override 
resetValues() in 
org.apache.drill.exec.physical.impl.aggregate.StreamingAggTemplate
 public boolean resetValues()
 ^
 overridden method does not throw 
org.apache.drill.exec.exception.SchemaChangeException 
(compiler.err.override.meth.doesnt.throw)
{noformat}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Query Question

2019-04-11 Thread Ted Dunning

The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
 as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.

> cbind(c(1,2), c(4,5,6))
>  [,1] [,2]
> [1,]14
> [2,]25
> [3,]16
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>   number of rows of result is not a multiple of vector length (arg 1)


I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.


On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha  wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?   I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre  wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers 
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgi...@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > > “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1| 4
> > > 2  |  5
> > > 3  |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1  |  4
> > > 1  |  5
> > > 1  |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

[jira] [Resolved] (DRILL-7165) Redundant Checksum calculating for ASC files

2019-04-11 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia resolved DRILL-7165.
--
Resolution: Fixed

> Redundant Checksum calculating for ASC files
> 
>
> Key: DRILL-7165
> URL: https://issues.apache.org/jira/browse/DRILL-7165
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an 
> zip archives and for ASC (signature) files. The last is redundant. For 
> example:
> apache-drill-1.15.0-src.tar.gz.asc.sha512
> apache-drill-1.15.0-src.zip.asc.sha512
> apache-drill-1.15.0.tar.gz.asc.sha512
> The proper list of files: 
> [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] sohami merged pull request #1743: DRILL-7165: Redundant Checksum calculating for ASC files

2019-04-11 Thread GitBox

sohami merged pull request #1743: DRILL-7165: Redundant Checksum calculating 
for ASC files
URL: https://github.com/apache/drill/pull/1743
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] sohami closed pull request #1745: DRILL-7166: Count query with wildcard should skip reading of metadata summary file

2019-04-11 Thread GitBox

sohami closed pull request #1745: DRILL-7166: Count query with wildcard should 
skip reading of metadata summary file
URL: https://github.com/apache/drill/pull/1745
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7172) README files for steps describing building C++ Drill client (with protobuf) needs to be updated

2019-04-11 Thread Kunal Khatua (JIRA)

Kunal Khatua created DRILL-7172:
---

 Summary: README files for steps describing building C++ Drill 
client (with protobuf) needs to be updated
 Key: DRILL-7172
 URL: https://issues.apache.org/jira/browse/DRILL-7172
 Project: Apache Drill
  Issue Type: Task
Reporter: Kunal Khatua
Assignee: Denys Ordynskiy


During the 1.16.0 release, it was noticed that the steps (primarily library 
versions) for rebuilding with protobuf-3.6.1 was outdated. 

e.g. the Boost library version for building is reported as 1.53, where as 1.60 
in another place. The steps worked on an Ubuntu setup, but failed for CentOS 7x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7171) Count(*) query on leaf level directory is not reading summary cache file.

2019-04-11 Thread Venkata Jyothsna Donapati (JIRA)

Venkata Jyothsna Donapati created DRILL-7171:


 Summary: Count(*) query on leaf level directory is not reading 
summary cache file.
 Key: DRILL-7171
 URL: https://issues.apache.org/jira/browse/DRILL-7171
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venkata Jyothsna Donapati
Assignee: Venkata Jyothsna Donapati


Since the leaf level directory doesn't store the metadata directories file, 
while reading summary if the directories cache file is not present, it is 
assumed that the cache is possibly corrupt and reading of the summary cache 
file is skipped. Metadata directories cache file should be created at the leaf 
level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] amansinha100 closed pull request #671: DRILL-4347: Propagate distinct row count for joins from logical plann…

2019-04-11 Thread GitBox

amansinha100 closed pull request #671: DRILL-4347: Propagate distinct row count 
for joins from logical plann…
URL: https://github.com/apache/drill/pull/671
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] gparai commented on issue #606: Drill-1328: Compute and use statistics in Drill

2019-04-11 Thread GitBox

gparai commented on issue #606: Drill-1328: Compute and use statistics in Drill
URL: https://github.com/apache/drill/pull/606#issuecomment-482266211
 
 
   The changes were merged with the PR 
https://github.com/apache/drill/pull/729. I will close this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7170) IllegalStateException: Record count not set for this vector container

2019-04-11 Thread Sorabh Hamirwasia (JIRA)

Sorabh Hamirwasia created DRILL-7170:


 Summary: IllegalStateException: Record count not set for this 
vector container
 Key: DRILL-7170
 URL: https://issues.apache.org/jira/browse/DRILL-7170
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Sorabh Hamirwasia
 Fix For: 1.17.0



{code:java}
Query: 
/root/drillAutomation/master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query95.sql
WITH ws_wh AS
(
SELECT ws1.ws_order_number,
ws1.ws_warehouse_sk wh1,
ws2.ws_warehouse_sk wh2
FROM   web_sales ws1,
web_sales ws2
WHERE  ws1.ws_order_number = ws2.ws_order_number
ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
SELECT
Count(DISTINCT ws_order_number) AS `order count` ,
Sum(ws_ext_ship_cost)   AS `total shipping cost` ,
Sum(ws_net_profit)  AS `total net profit`
FROM web_sales ws1 ,
date_dim ,
customer_address ,
web_site
WHEREd_date BETWEEN '2000-04-01' AND  (
Cast('2000-04-01' AS DATE) + INTERVAL '60' day)
AND  ws1.ws_ship_date_sk = d_date_sk
AND  ws1.ws_ship_addr_sk = ca_address_sk
AND  ca_state = 'IN'
AND  ws1.ws_web_site_sk = web_site_sk
AND  web_company_name = 'pri'
AND  ws1.ws_order_number IN
(
SELECT ws_order_number
FROM   ws_wh)
AND  ws1.ws_order_number IN
(
SELECT wr_order_number
FROM   web_returns,
ws_wh
WHERE  wr_order_number = ws_wh.ws_order_number)
ORDER BY count(DISTINCT ws_order_number)
LIMIT 100

Exception:

java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Record count not 
set for this vector container

Fragment 2:3

Please, refer to logs for more information.

[Error Id: 4ed92fce-505b-40ba-ac0e-4a302c28df47 on drill87:31010]

  (java.lang.IllegalStateException) Record count not set for this vector 
container

org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState():459
org.apache.drill.exec.record.VectorContainer.getRecordCount():394
org.apache.drill.exec.record.RecordBatchSizer.():720
org.apache.drill.exec.record.RecordBatchSizer.():704

org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():462

org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():964

org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():973

org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():601

org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():1313

org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():1105
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():525
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.test.generated.HashAggregatorGen1068899.doWork():642
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():296
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1669
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:642)
at 
oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:148)

[GitHub] [drill] vvysotskyi edited a comment on issue #671: DRILL-4347: Propagate distinct row count for joins from logical plann…

2019-04-11 Thread GitBox

vvysotskyi edited a comment on issue #671: DRILL-4347: Propagate distinct row 
count for joins from logical plann…
URL: https://github.com/apache/drill/pull/671#issuecomment-482250847
 
 
   Ok, if this issue was already fixed, I agree that we can close this PR. 
Thanks for clarifying.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on issue #1743: DRILL-7165: Redundant Checksum calculating for ASC files

2019-04-11 Thread GitBox

vdiravka commented on issue #1743: DRILL-7165: Redundant Checksum calculating 
for ASC files
URL: https://github.com/apache/drill/pull/1743#issuecomment-482254354
 
 
   @sohami PR for project root POM ArtifactId rename is created: #1746


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka opened a new pull request #1746: DRILL-7169: Rename drill-root ArtifactID to apache-drill

2019-04-11 Thread GitBox

vdiravka opened a new pull request #1746: DRILL-7169: Rename drill-root 
ArtifactID to apache-drill
URL: https://github.com/apache/drill/pull/1746
 
 
   Change 'project.artifactId' from 'drill-root' to 'apache-drill'
   
   _Note: it includes changes for #1743 also (it is expected that PR will be 
merged first)_


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7169) Rename drill-root ArtifactID to apache-drill

2019-04-11 Thread Vitalii Diravka (JIRA)

Vitalii Diravka created DRILL-7169:
--

 Summary: Rename drill-root ArtifactID to apache-drill
 Key: DRILL-7169
 URL: https://issues.apache.org/jira/browse/DRILL-7169
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.15.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: Future


Rename {{drill-root}} root POM ArtifactID to {{apache-drill, see:}}
{{[https://github.com/apache/drill/blob/master/pom.xml#L32]}}

Most of all Apache projects use short project name as artifactId.
Rename it to {{apache-drill}} allow to use it as variable for drill build 
process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] amansinha100 commented on issue #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on issue #1744: Drill 7148 - Join order, multi-col ndv 
and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#issuecomment-482245782
 
 
   @gparai the phase 1 aggregate related changes and the `TableScan` changes 
LGTM.  For the join cardinality, I would like to step back and look at the 
overall set of changes not just from this PR but prior ones to get a better 
understanding.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274569742
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -148,29 +154,59 @@ private Double getDistinctRowCount(DrillScanRelBase 
scan, RelMetadataQuery mq, D
   // Statistics cannot be obtained, use default behaviour
   return scan.estimateRowCount(mq) * 0.1;
 }
-double s = 1.0;
 
-for (int i = 0; i < groupKey.length(); i++) {
-  final String colName = type.getFieldNames().get(i);
-  // Skip NDV, if not available
-  if (!groupKey.get(i)) {
-continue;
+double s = 1.0;
+if (((int)PrelUtil.getPlannerSettings(
+scan.getCluster().getPlanner()).getStatisticsJoinCardinalityMode() & 
2) == 2) {
+  for (int i = 0; i < groupKey.length(); i++) {
+final String colName = type.getFieldNames().get(i);
+// Skip NDV, if not available
+if (!groupKey.get(i)) {
+  continue;
+}
+ColumnStatistics columnStatistics = tableMetadata != null ? 
tableMetadata.getColumnStatistics(SchemaPath.getSimplePath(colName)) : null;
+Double ndv = columnStatistics != null ? (Double) 
columnStatistics.getStatistic(ColumnStatisticsKind.NVD) : null;
 
 Review comment:
   `ColumnStatisticsKind.NVD` ==> NDV 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on issue #671: DRILL-4347: Propagate distinct row count for joins from logical plann…

2019-04-11 Thread GitBox

amansinha100 commented on issue #671: DRILL-4347: Propagate distinct row count 
for joins from logical plann…
URL: https://github.com/apache/drill/pull/671#issuecomment-482228863
 
 
   @vvysotskyi thanks for re-surfacing this.  From the JIRA comments, this JIRA 
was fixed by DRILL-4678.  I think we can close this PR.  Let me know what you 
think. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #186: Parquet meta

2019-04-11 Thread GitBox

vvysotskyi commented on issue #186: Parquet meta
URL: https://github.com/apache/drill/pull/186#issuecomment-482199141
 
 
   Fixed in 1cfd4c20d30dd042290e769472e60d06ae66020c


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi closed pull request #186: Parquet meta

2019-04-11 Thread GitBox

vvysotskyi closed pull request #186: Parquet meta
URL: https://github.com/apache/drill/pull/186
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi closed pull request #234: DRILL-3423

2019-04-11 Thread GitBox

vvysotskyi closed pull request #234: DRILL-3423
URL: https://github.com/apache/drill/pull/234
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #234: DRILL-3423

2019-04-11 Thread GitBox

vvysotskyi commented on issue #234: DRILL-3423
URL: https://github.com/apache/drill/pull/234#issuecomment-482198162
 
 
   Fixed in 46c0f2a4135450417dfebf52f11538f8926fd467


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #177: Issues/drill 3791

2019-04-11 Thread GitBox

vvysotskyi commented on issue #177: Issues/drill 3791
URL: https://github.com/apache/drill/pull/177#issuecomment-482194818
 
 
   Fixed in #251


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi closed pull request #177: Issues/drill 3791

2019-04-11 Thread GitBox

vvysotskyi closed pull request #177: Issues/drill 3791
URL: https://github.com/apache/drill/pull/177
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi closed pull request #182: [DRILL-3791] JDBC Storage Plugin Testing

2019-04-11 Thread GitBox

vvysotskyi closed pull request #182: [DRILL-3791] JDBC Storage Plugin Testing
URL: https://github.com/apache/drill/pull/182
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi removed a comment on issue #182: [DRILL-3791] JDBC Storage Plugin Testing

2019-04-11 Thread GitBox

vvysotskyi removed a comment on issue #182: [DRILL-3791] JDBC Storage Plugin 
Testing
URL: https://github.com/apache/drill/pull/182#issuecomment-482193522
 
 
   Fixed in https://github.com/apache/drill/pull/251


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #182: [DRILL-3791] JDBC Storage Plugin Testing

2019-04-11 Thread GitBox

vvysotskyi commented on issue #182: [DRILL-3791] JDBC Storage Plugin Testing
URL: https://github.com/apache/drill/pull/182#issuecomment-482193522
 
 
   Fixed in https://github.com/apache/drill/pull/251


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274514756
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -229,4 +321,42 @@ public Double getDistinctRowCount(DrillJoinRelBase 
joinRel, RelMetadataQuery mq,
 }
 return RelMdUtil.numDistinctVals(distRowCount, mq.getRowCount(joinRel));
   }
+
+  private ImmutableBitSet getSingleGbyKey(ImmutableBitSet groupKey, int idx) {
+if (groupKey.get(idx)) {
+  return ImmutableBitSet.builder().set(idx, idx+1).build();
+} else {
+  return null;
+}
+  }
+
+  private double getPredSelectivityContainingInputRef(RexNode predicate, int 
inputRef,
+  RelMetadataQuery mq, TableScan scan) {
+if (predicate instanceof RexCall) {
+  if (predicate.getKind() == SqlKind.AND) {
+double sel, andSel = 1.0;
+for (RexNode op : ((RexCall) predicate).getOperands()) {
+  sel = getPredSelectivityContainingInputRef(op, inputRef, mq, scan);
+  andSel *= sel > 0 ? sel : 1.0;
 
 Review comment:
   `sel > 0` already includes the value 1.0 .. can you clarify this setting ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274512764
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -208,8 +244,64 @@ public Double getDistinctRowCount(DrillJoinRelBase 
joinRel, RelMetadataQuery mq,
 Double leftDistRowCount = null;
 Double rightDistRowCount = null;
 double distRowCount = 1;
+int gbyCols = 0;
 ImmutableBitSet lmb = leftMask.build();
 ImmutableBitSet rmb = rightMask.build();
+PlannerSettings plannerSettings = 
PrelUtil.getPlannerSettings(joinRel.getCluster().getPlanner());
+if (((int)plannerSettings.getStatisticsJoinCardinalityMode() & 4) == 4) {
+/*
+ * The NDV for a multi-column GBY key past a join is determined as follows:
+ * GBY(s1, s2, s3) = CNDV(s1)*CNDV(s2)*CNDV(s3)
+ * where CNDV is determined as follows:
+ * A) If sX is present as a join column (sX = tX) CNDV(sX) = MIN(NDV(sX), 
NDV(tX)) where X =1, 2, 3, etc
+ * B) Otherwise, based on independence assumption CNDV(sX) = NDV(sX)
+ */
+  Set joinFiltersSet = new HashSet<>();
+  for (RexNode filter : joinFilters) {
+final RelOptUtil.InputFinder inputFinder = 
RelOptUtil.InputFinder.analyze(filter);
+joinFiltersSet.add(inputFinder.inputBitSet.build());
+  }
+  for (int idx = 0; idx < groupKey.length(); idx++) {
+if (groupKey.get(idx)) {
+  // GBY key is present in some filter - now try options A) and B) as 
described above
+  double ndvSGby = 0;
+  ImmutableBitSet sGby = getSingleGbyKey(groupKey, idx);
+  if (sGby != null) {
+for (ImmutableBitSet jFilter : joinFiltersSet) {
 
 Review comment:
   Suppose the query is  ' SELECT  distinct a1 FROM t1, t2, t3  where t1.a1 = 
t2.a2 and t1.a1 = t2.a3 '
   In this case, the grouping key is involved in 2 joins .. can we do an 
offline walk-through to see whether the top join's NDV is being used. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274503913
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -80,10 +87,10 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof DrillScanRelBase) {  // Applies to both 
Drill Logical and Physical Rels
+if (rel instanceof TableScan) { // Applies to both 
Drill Logical and Physical Rels
 
 Review comment:
   Comment should indicate that it applies to Calcite logical table scan also 
since that seems to be the purpose of this change. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274498821
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdRowCount.java
 ##
 @@ -53,7 +53,21 @@ public Double getRowCount(Aggregate rel, RelMetadataQuery 
mq) {
 
 if (groupKey.isEmpty()) {
   return 1.0;
-} else {
+} else if (rel instanceof AggPrelBase &&
+((AggPrelBase) rel).getOperatorPhase() == 
AggPrelBase.OperatorPhase.PHASE_1of2) {
+  // Phase 1 Aggregate would return rows in the range [NDV, input_rows]. 
Hence, use the
+  // existing estimate of 1/10 * input_rows
+Double distinctRowCount = mq.getRowCount(rel.getInput()) / 10;
 
 Review comment:
   Can we not get this default value from an existing Calcite utility ?  This 
will allow keeping in sync in the future if the ratio changes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1744: Drill 7148 - Join order, multi-col ndv and aggregate rowcount fixes for TPCH queries

2019-04-11 Thread GitBox

amansinha100 commented on a change in pull request #1744: Drill 7148 - Join 
order, multi-col ndv and aggregate rowcount fixes for TPCH queries
URL: https://github.com/apache/drill/pull/1744#discussion_r274501178
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ##
 @@ -667,4 +667,66 @@ public static DrillTable getDrillTable(final TableScan 
scan) {
 }
 return drillTable;
   }
+
+  public static List> analyzeSimpleEquiJoin(Join join) {
+List> joinConditions = new ArrayList<>();
+try {
+  RexVisitor visitor =
+  new RexVisitorImpl(true) {
+public Void visitCall(RexCall call) {
+  if (call.getKind() == SqlKind.AND || call.getKind() == 
SqlKind.OR) {
+super.visitCall(call);
+  } else {
+if (call.getKind() == SqlKind.EQUALS) {
+  int leftFieldCount = 
join.getLeft().getRowType().getFieldCount();
+  int rightFieldCount = 
join.getRight().getRowType().getFieldCount();
+  RexNode leftComparand = call.operands.get(0);
+  RexNode rightComparand = call.operands.get(1);
+  RexInputRef leftFieldAccess = (RexInputRef) leftComparand;
+  RexInputRef rightFieldAccess = (RexInputRef) rightComparand;
+  if (leftFieldAccess.getIndex() >= leftFieldCount + 
rightFieldCount ||
 
 Review comment:
   When will this condition happen ?  i.e the join column's index is out of 
range ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #182: [DRILL-3791] JDBC Storage Plugin Testing

2019-04-11 Thread GitBox

vvysotskyi commented on issue #182: [DRILL-3791] JDBC Storage Plugin Testing
URL: https://github.com/apache/drill/pull/182#issuecomment-482193030
 
 
   Fixed in https://github.com/apache/drill/pull/251


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7168) Implement ALTER TABLE SCHEMA ADD / REMOVE COLUMN / PROPERTY commands

2019-04-11 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-7168:
---

 Summary: Implement ALTER TABLE SCHEMA ADD / REMOVE COLUMN / 
PROPERTY commands
 Key: DRILL-7168
 URL: https://issues.apache.org/jira/browse/DRILL-7168
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
 Fix For: 1.17.0


By [~Paul.Rogers]:
{quote}
Sooner or later users are going to ask for a command to update just the 
properties, or just add or remove a column, without having to spell out the 
entire new schema. ALTER TABLE SCHEMA ADD/REMOVE COLUMN/PROPERTY ...
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] dvjyothsna commented on issue #1745: DRILL-7166: Count query with wildcard should skip reading of metadata summary file

2019-04-11 Thread GitBox

dvjyothsna commented on issue #1745: DRILL-7166: Count query with wildcard 
should skip reading of metadata summary file
URL: https://github.com/apache/drill/pull/1745#issuecomment-482179689
 
 
   @amansinha100 I have changed the unit test. Can you please take a look at it
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi closed pull request #573: DRILL-4858 : Fix REPEATED_COUNT on JSON containing an array of maps

2019-04-11 Thread GitBox

vvysotskyi closed pull request #573: DRILL-4858 : Fix REPEATED_COUNT on JSON 
containing an array of maps
URL: https://github.com/apache/drill/pull/573
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #573: DRILL-4858 : Fix REPEATED_COUNT on JSON containing an array of maps

2019-04-11 Thread GitBox

vvysotskyi commented on issue #573: DRILL-4858 : Fix REPEATED_COUNT on JSON 
containing an array of maps
URL: https://github.com/apache/drill/pull/573#issuecomment-482171598
 
 
   This issue was fixed in https://github.com/apache/drill/pull/1641


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #606: Drill-1328: Compute and use statistics in Drill

2019-04-11 Thread GitBox

vvysotskyi commented on issue #606: Drill-1328: Compute and use statistics in 
Drill
URL: https://github.com/apache/drill/pull/606#issuecomment-482170794
 
 
   @gparai, another PR for DRILL-1328 was merged, can this PR be closed, or 
there are some other changes which should be merged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: Query Question

2019-04-11 Thread Aman Sinha

> I thought flatten() would be the answer, however, if I flatten the
columns, I get the following result:

Regarding the flatten() output, this is expected because doing a 'SELECT
flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
the 2 arrays.

In your example, both arrays are the same length, but what would you expect
the output to be if they were different ?   I don't see a direct SQL way of
doing it but
even with UDFs the semantics should be defined.

Aman

On Thu, Apr 11, 2019 at 6:37 AM Charles Givre  wrote:

> That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> as a separate PR.
> — C
>
> > On Apr 10, 2019, at 20:44, Paul Rogers 
> wrote:
> >
> > Hi Charles,
> >
> > In Python [1], the "zip" function does this task:
> >
> >
> > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> >
> >
> > When you gathered the list of functions for the Drill book, did you come
> across anything like this in Drill? I presume you didn't, hence the
> question. I did a quick (incomplete) check and didn't see any likely
> candidates.
> >
> > Perhaps you could create such a function.
> >
> > Once you have the zipped result, you could flatten to get the pairs as
> rows.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> cgi...@gmail.com> wrote:
> >
> > Hello Drillers,
> > I have a query question for you.  I have some really ugly data that has
> a field like this:
> >
> > compound_field : { “field_1”: [1,2,3],
> > “field_2”:[4,5,6]
> > )
> >
> > I would like to map fields 1 and 2 to columns so that the end result is:
> >
> > field1 | field2
> > 1| 4
> > 2  |  5
> > 3  |  5
> >
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
> >
> > field1 | field2
> > 1  |  4
> > 1  |  5
> > 1  |  6
> >
> > Does anyone have any suggestions?
> > Thanks,
> > —C
>
>

Re: Query Question

2019-04-11 Thread Charles Givre

That’s a good idea.  I’ll work on a equivalent ZIP() function and submit as a 
separate PR.
— C

> On Apr 10, 2019, at 20:44, Paul Rogers  wrote:
> 
> Hi Charles,
> 
> In Python [1], the "zip" function does this task:
> 
> 
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> 
> 
> When you gathered the list of functions for the Drill book, did you come 
> across anything like this in Drill? I presume you didn't, hence the question. 
> I did a quick (incomplete) check and didn't see any likely candidates.
> 
> Perhaps you could create such a function.
> 
> Once you have the zipped result, you could flatten to get the pairs as rows.
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre 
>  wrote:  
> 
> Hello Drillers,
> I have a query question for you.  I have some really ugly data that has a 
> field like this:
> 
> compound_field : { “field_1”: [1,2,3],
> “field_2”:[4,5,6]
> )
> 
> I would like to map fields 1 and 2 to columns so that the end result is:
> 
> field1 | field2
> 1| 4
> 2  |  5
> 3  |  5
> 
> I thought flatten() would be the answer, however, if I flatten the columns, I 
> get the following result:
> 
> field1 | field2
> 1  |  4
> 1  |  5
> 1  |  6
> 
> Does anyone have any suggestions?
> Thanks,
> —C

[jira] [Created] (DRILL-7167) DESCRIBE TABLE statement is not implemented

2019-04-11 Thread Dmytriy Grinchenko (JIRA)

Dmytriy Grinchenko created DRILL-7167:
-

 Summary: DESCRIBE TABLE statement is not implemented 
 Key: DRILL-7167
 URL: https://issues.apache.org/jira/browse/DRILL-7167
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Dmytriy Grinchenko
Assignee: Dmytriy Grinchenko
 Fix For: 1.17.0


DESCRIBE dfs.tmp.`table` - works fine 

DESCRIBE TABLE dfs.tmp.`table` - fails with error:
{code:java}
//todo
{code}

DESCRIBE TABLE should work the same as DESCRIBE;




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

40 matches

Mail list logo