[jira] [Commented] (DRILL-8200) Update hadoop-common to ≥ 3.2.3 for CVE-2022-26612

2022-04-26 Thread Ted Dunning (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528249#comment-17528249
 ] 

Ted Dunning commented on DRILL-8200:


My reading of the CVE indicates that this applies only on Windows.

Do others see it the same?

> Update hadoop-common to ≥ 3.2.3 for CVE-2022-26612
> --
>
> Key: DRILL-8200
> URL: https://issues.apache.org/jira/browse/DRILL-8200
> Project: Apache Drill
>  Issue Type: Bug
>  Components: library
>Affects Versions: 1.20.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Critical
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-7949) documentation error - missing link

2021-06-05 Thread Ted Dunning (Jira)
Ted Dunning created DRILL-7949:
--

 Summary: documentation error - missing link
 Key: DRILL-7949
 URL: https://issues.apache.org/jira/browse/DRILL-7949
 Project: Apache Drill
  Issue Type: Task
Reporter: Ted Dunning


In checking rc1 for 1.19, I noted that this page:

[https://drill.apache.org/docs/configuring-storage-plugins/]

has a link to "Start the web UI" to 
[https://drill.apache.org/docs/starting-the-web-console/]

and that page does not exist.

I think that link should go to 
[https://drill.apache.org/docs/starting-the-web-ui/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7277) Bug in planner with redundant order-by

2019-05-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848071#comment-16848071
 ] 

Ted Dunning commented on DRILL-7277:


This query:
{{select row_number() over (order by department_id desc) r, department_id
from (select department_id
 from  cp.`employee.json`
 order by department_id desc) ;}}

blows beets as below but putting department_id first in the output doesn't.

{{java.sql.SQLException: [MapR][DrillJDBCDriver](500165) Query execution error. 
Details: SYSTEM ERROR: CannotPlanException: Node 
[rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]] could not be implemented; planner 
state:

Root: rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]
Original rel:
LogicalProject(subset=[rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]], r=[$1], 
department_id=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26935
  LogicalWindow(subset=[rel#26934:Subset#3.NONE.ANY([]).[1 DESC]], 
window#0=[window(partition {} order by [0 DESC] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]): rowcount = 100.0, cumulative 
cost = {100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26933
LogicalSort(subset=[rel#26932:Subset#2.NONE.ANY([]).[0 DESC]], sort0=[$0], 
dir0=[DESC]): rowcount = 100.0, cumulative cost = {100.0 rows, 
1842.0680743952366 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26931
  LogicalProject(subset=[rel#26930:Subset#1.NONE.ANY([]).[]], 
department_id=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26929
EnumerableTableScan(subset=[rel#26928:Subset#0.ENUMERABLE.ANY([]).[]], 
table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 rows, 
101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26880

Sets:
Set#0, type: RecordType(DYNAMIC_STAR **, ANY department_id)
rel#26928:Subset#0.ENUMERABLE.ANY([]).[], best=rel#26880, 
importance=0.59049001
rel#26880:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[cp, 
employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 
io, 0.0 network, 0.0 memory}
rel#26952:Subset#0.LOGICAL.ANY([]).[], best=rel#26954, 
importance=0.3247695
rel#26954:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, 
employee.json],groupscan=EasyGroupScan [selectionRoot=classpath:/employee.json, 
numFiles=1, columns=[`**`, `department_id`], 
files=[classpath:/employee.json]]), rowcount=463.0, cumulative cost={463.0 
rows, 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
Set#1, type: RecordType(ANY department_id)
rel#26930:Subset#1.NONE.ANY([]).[], best=null, importance=0.6561

rel#26929:LogicalProject.NONE.ANY([]).[](input=rel#26928:Subset#0.ENUMERABLE.ANY([]).[],department_id=$1),
 rowcount=100.0, cumulative cost={inf}
rel#26931:LogicalSort.NONE.ANY([]).[0 
DESC](input=rel#26930:Subset#1.NONE.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=100.0, cumulative cost={inf}
rel#26943:Subset#1.LOGICAL.ANY([]).[], best=rel#26950, importance=0.405
rel#26944:DrillSortRel.LOGICAL.ANY([]).[0 
DESC](input=rel#26943:Subset#1.LOGICAL.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=463.0, cumulative cost={926.0 rows, 11830.070504167705 cpu, 0.0 io, 
0.0 network, 0.0 memory}
rel#26950:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, 
employee.json],groupscan=EasyGroupScan [selectionRoot=classpath:/employee.json, 
numFiles=1, columns=[`department_id`], files=[classpath:/employee.json]]), 
rowcount=463.0, cumulative cost={463.0 rows, 463.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}

rel#26953:DrillProjectRel.LOGICAL.ANY([]).[](input=rel#26952:Subset#0.LOGICAL.ANY([]).[],department_id=$1),
 rowcount=463.0, cumulative cost={926.0 rows, 4630463.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}
rel#26946:Subset#1.NONE.ANY([]).[0 DESC], best=null, 
importance=0.7291
rel#26931:LogicalSort.NONE.ANY([]).[0 
DESC](input=rel#26930:Subset#1.NONE.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=100.0, cumulative cost={inf}
rel#26947:Subset#1.LOGICAL.ANY([]).[1 DESC], best=null, importance=0.81
rel#26948:Subset#1.LOGICAL.ANY([]).[0 DESC], best=rel#26944, 
importance=0.405
rel#26944:DrillSortRel.LOGICAL.ANY([]).[0 
DESC](input=rel#26943:Subset#1.LOGICAL.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=463.0, cumulative cost={926.0 rows, 11830.070504167705 cpu, 0.0 io, 
0.0 network, 0.0 memory}
Set#3, type: RecordType(ANY department_id, BIGINT w0$o0)
rel#26934:Subset#3.NONE.ANY([]).[1 DESC], best=null, importance=0.81
rel#26933:LogicalWindow.NONE.ANY([]).[[1 
DESC]](input=rel#26946:Subset#1.NONE.ANY([]).[0 DESC],window#0=window(partition 
{} order by [0 DESC] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs 
[ROW_NUMBER()])), 

[jira] [Created] (DRILL-7277) Bug in planner with redundant order-by

2019-05-25 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-7277:
--

 Summary: Bug in planner with redundant order-by
 Key: DRILL-7277
 URL: https://issues.apache.org/jira/browse/DRILL-7277
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Ted Dunning






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4223) PIVOT and UNPIVOT to rotate table valued expressions

2018-03-06 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388637#comment-16388637
 ] 

Ted Dunning commented on DRILL-4223:


 

This is actually related to list_aggregate and some kind of inverse to 
flatten/unnest. My guess is that if we had a JSON constructor, this would be 
just about as good. The idea would be that columns could be specified to 
determine the key and value in an object. Aggregation would be the final step 
to get what John wants. Aggregation over structures is an open question since 
you don't necessarily know the keys in a structure. It would be nice to be able 
to apply an aggregation function to all members of the structure without 
knowing which members exist.

 

 

> PIVOT and UNPIVOT to rotate table valued expressions
> 
>
> Key: DRILL-4223
> URL: https://issues.apache.org/jira/browse/DRILL-4223
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Codegen, SQL Parser
>Reporter: Ashwin Aravind
>Priority: Major
>
> Capability to PIVOT and UNPIVOT table values expressions which are results of 
> a SELECT query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-03-01 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382894#comment-16382894
 ] 

Ted Dunning commented on DRILL-6190:


Wasn't this already reviewed? The changes since then are trivial. Same for
6191.




> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6191) Need more information on TCP flags

2018-02-28 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381466#comment-16381466
 ] 

Ted Dunning commented on DRILL-6191:


Fixed the test to release results. Updated pull request. This pull may now 
conflict with DRILL-6190, but probably not.

> Need more information on TCP flags
> --
>
> Key: DRILL-6191
> URL: https://issues.apache.org/jira/browse/DRILL-6191
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
>  
> This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-28 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381429#comment-16381429
 ] 

Ted Dunning commented on DRILL-6190:


Travis build is fixed:
h3. [ #5031 passed|https://travis-ci.org/apache/drill/builds/347567906]
 *  Ran for 43 min 7 sec

> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6191) Need more information on TCP flags

2018-02-27 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-6191:
---
Fix Version/s: 1.13.0

> Need more information on TCP flags
> --
>
> Key: DRILL-6191
> URL: https://issues.apache.org/jira/browse/DRILL-6191
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
>  
> This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-27 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-6190:
---
Fix Version/s: 1.13.0

> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6191) Need more information on TCP flags

2018-02-27 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378816#comment-16378816
 ] 

Ted Dunning commented on DRILL-6191:


Created pull request for this

> Need more information on TCP flags
> --
>
> Key: DRILL-6191
> URL: https://issues.apache.org/jira/browse/DRILL-6191
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>
>  
> This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-27 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378817#comment-16378817
 ] 

Ted Dunning commented on DRILL-6190:


Created pull request for this.

> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6191) Need more information on TCP flags

2018-02-27 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-6191:
--

 Summary: Need more information on TCP flags
 Key: DRILL-6191
 URL: https://issues.apache.org/jira/browse/DRILL-6191
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


 

This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-27 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-6190:
--

 Summary: Packets can be bigger than strictly legal
 Key: DRILL-6190
 URL: https://issues.apache.org/jira/browse/DRILL-6190
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


Packets, especially those generated by malware, can be bigger than the legal 
limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6067) Add acknowledgement sequence number and flags to TCP fields

2018-01-02 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-6067:
--

 Summary: Add acknowledgement sequence number and flags to TCP 
fields
 Key: DRILL-6067
 URL: https://issues.apache.org/jira/browse/DRILL-6067
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5957) Wire protocol versioning, version negotiation

2017-11-12 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249063#comment-16249063
 ] 

Ted Dunning commented on DRILL-5957:



This suggestion has the virtue that only breaking changes will cause a version 
update, but it still has the problem that the version has to move no matter 
what part of the protocol changes. This is reminiscent of the old CORBA 
versioning nightmares.

Also, is there really any way to negotiate the value vector format without 
having a reformatting step inserted with fairly catastrophic performance hit?

I don't see a consideration of the cost of maintaining old version 
compatibility, either. If old client versions work, then there will be no 
incentive to upgrade. That will increase pressure to keep adding multiple 
protocol support to the server and will seemingly lock down any real progress 
just as much as client/server lockstepping. 

It seems that the short term desire here is to allow the vector format to 
change. What about making the current dvector parts be optional and adding 
alternative (optional) dvector parts in new formats? This effectively allows 
versioning of only the dvector stuff, leaving all the rest of the protocol to 
be soft-versioned as is currently done. The client advertised version could be 
used to trigger one format or the other and the incentive to upgrade is in the 
form of much slower transfer for the old format due to transcoding.





> Wire protocol versioning, version negotiation
> -
>
> Key: DRILL-5957
> URL: https://issues.apache.org/jira/browse/DRILL-5957
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>
> Drill has very limited support for evolving its wire protocol. As Drill 
> becomes more widely deployed, this limitation will constrain the project's 
> ability to rapidly evolve the wire protocol based on user experience to 
> improve simplicitly, performance or minimize resource use.
> Proposed is a standard mechanism to version the API and negotiate the API 
> version between client and server at connect time. The focus here is between 
> Drill clients (JDBC, ODBC) and the Drill server. The same mechanism can also 
> be used between servers to support rolling upgrades.
> This proposal is an outline; it is not a detailed design. The purpose here is 
> to drive understanding of the problem. Once we have that, we can focus on the 
> implementation details.
> h4. Problem Statement
> The problem we wish to address here concerns both the _syntax_ and 
> _semantics_ of API messages. Syntax concerns:
> * The set of messages and their sequence
> * The format of bytes on the wire
> * The format of message packets
> Semantics concerns:
> * The meaning of each field.
> * The layout of non-message data (vectors, in Drill.)
> We wish to introduce a system whereby both syntax and semantics can be 
> evolved in a controlled, known manner such that:
> * A client of version x can connect to, and interoperate with, a server in a 
> range of versions (x-y, x+z) for some values of y and z.
> For example, version x of the Drill client is deployed in the field. It must 
> connect to the oldest Drill cluster available to that client. (That is it 
> must connect to servers up to y versions old.) During an upgrade, the server 
> may be upgraded before the client. Thus, the client must also work with 
> servers up to z versions newer than the client.
> If we wish to tackle rolling upgrades, then y and z can both be 1 for 
> server-to-server APIs. A version x server will talk with (x-1) servers when 
> the cluster upgrades to x, and will talk to (x+1) servers when the cluster is 
> upgraded to version (x+1).
> h4. Current State
> Drill currently provides some ad-hoc version compatibility:
> * Slow change. Drill's APIs have not changed much since Drill 1.0, thereby 
> avoiding the issue.
> * Protobuf support. Drill uses Protobuf for message bodies, leveraging that 
> format's ability to absorb the additional or deprecation of individual fields.
> * API version number. The API holds a version number, though the code to use 
> it is rather ad-hoc.
> The above has allowed clever coding to handle some version changes, but each 
> is a one-off, ad-hoc collision. The recent security work is an example that, 
> with enough effort, ad-hoc solutions can be found.
> The above cannot handle:
> * Change in the message order
> * Change in the "pbody/dbody" structure of each message.
> * Change in the structure of serialized value vectors.
> As a result, the current structure prevents any change to Drill's core 
> mechanism, value vectors, as there is no way or clients and servers to 
> negotiate the vector wire format. For example, Drill cannot adopt Arrow 
> because a pre-Arrow client would not understand 

[jira] [Created] (DRILL-5790) PCAP format explicitly opens local file

2017-09-14 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-5790:
--

 Summary: PCAP format explicitly opens local file
 Key: DRILL-5790
 URL: https://issues.apache.org/jira/browse/DRILL-5790
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


Note the new FileInputStream line
{code}
@Override
public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
try {
this.output = output;
this.buffer = new byte[10];
this.in = new FileInputStream(inputPath);
this.decoder = new PacketDecoder(in);
this.validBytes = in.read(buffer);
this.projectedCols = getProjectedColsIfItNull();
setColumns(projectedColumns);
} catch (IOException io) {
throw UserException.dataReadError(io)
.addContext("File name:", inputPath)
.build(logger);
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-04-24 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981793#comment-15981793
 ] 

Ted Dunning commented on DRILL-5432:



The version in github is now working. Thanks for Charles for the mac address 
code.

{code}
0: jdbc:drill:zk=local> select src_ip, count(1), sum(packet_length) from 
dfs.`/Users/tdunning/Apache/drill-pcap-format/x.pcap`   group by src_ip;
+--+-+-+
|  src_ip  | EXPR$1  | EXPR$2  |
+--+-+-+
| 10.0.1.5 | 24  | 3478|
| 23.72.217.110| 1   | 66  |
| 199.59.150.11| 1   | 66  |
| 35.167.153.146   | 2   | 194 |
| 149.174.66.131   | 1   | 54  |
| 152.163.13.6 | 1   | 54  |
| 35.166.185.92| 2   | 194 |
| 173.194.202.189  | 2   | 145 |
| 23.72.187.41 | 2   | 132 |
| 108.174.10.10| 4   | 561 |
| 12.220.154.66| 1   | 174 |
| 52.20.156.183| 1   | 98  |
| 74.125.28.189| 1   | 73  |
| 192.30.253.124   | 1   | 66  |
+--+-+-+
{code}

This is now up to the basic idea that we would like to have. The only major 
thing missing is the ability to group by TCP stream. You can emulate that by 
grouping by src_ip, dst_ip, src_port, dst_port, but we want something better.

Can somebody take a look at the code?


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5432) Want a memory format for PCAP files

2017-04-12 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-5432:
---
Description: 
PCAP files [1] are the de facto standard for storing network capture data. In 
security and protocol applications, it is very common to want to extract 
particular packets from a capture for further analysis.

At a first level, it is desirable to query and filter by source and destination 
IP and port or by protocol. Beyond that, however, it would be very useful to be 
able to group packets by TCP session and eventually to look at packet contents. 
For now, however, the most critical requirement is that we should be able to 
scan captures at very high speed.

I previously wrote a (kind of working) proof of concept for a PCAP decoder that 
did lazy deserialization and could traverse hundreds of MB of PCAP data per 
second per core. This compares to roughly 2-3 MB/s for widely available 
Apache-compatible open source PCAP decoders.

This JIRA covers the integration and extension of that proof of concept as a 
Drill file format.

Initial work is available at https://github.com/mapr-demos/drill-pcap-format


[1] https://en.wikipedia.org/wiki/Pcap

  was:
PCAP files [1] are the de facto standard for storing network capture data. In 
security and protocol applications, it is very common to want to extract 
particular packets from a capture for further analysis.

At a first level, it is desirable to query and filter by source and destination 
IP and port or by protocol. Beyond that, however, it would be very useful to be 
able to group packets by TCP session and eventually to look at packet contents. 
For now, however, the most critical requirement is that we should be able to 
scan captures at very high speed.

I previously wrote a (kind of working) proof of concept for a PCAP decoder that 
did lazy deserialization and could traverse hundreds of MB of PCAP data per 
second per core. This compares to roughly 2-3 MB/s for widely available 
Apache-compatible open source PCAP decoders.

This JIRA covers the integration and extension of that proof of concept as a 
Drill file format.

Initial work is available at https://github.com/mapr-demos/pcap-query


[1] https://en.wikipedia.org/wiki/Pcap


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-04-12 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967049#comment-15967049
 ] 

Ted Dunning commented on DRILL-5432:



Wow.  Missed that.

New URL: https://github.com/mapr-demos/drill-pcap-format

I will update the original comment so as to limit the number of people who are 
confused.


> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/pcap-query
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-04-12 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967041#comment-15967041
 ] 

Ted Dunning commented on DRILL-5432:


Charles,

I don't understand your comment. Tug reported the following output from a 
sample file:
{code}
select *
from dfs.`data`.`airtunes.pcap`
limit 10

+---+--+--+-+-+---+---++---+
| Type  | Network  |Timestamp | dst_ip  | src_ip
  | src_port  | dst_port  | packet_length  | data  |
+---+--+--+-+-+---+---++---+
| TCP   | 1| 2012-03-29 22:05:41.808  | /192.168.3.123  | 
/192.168.3.107  | 51594 | 5000  | 78 | []|
| TCP   | 1| 2012-03-29 22:05:41.808  | /192.168.3.107  | 
/192.168.3.123  | 5000  | 51594 | 78 | []|
| TCP   | 1| 2012-03-29 22:05:41.808  | /192.168.3.123  | 
/192.168.3.107  | 51594 | 5000  | 66 | []|
+---+--+--+-+-+---+---++---+
{code}

What is your change going to do?

> Want a memory format for PCAP files
> ---
>
> Key: DRILL-5432
> URL: https://issues.apache.org/jira/browse/DRILL-5432
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Ted Dunning
>
> PCAP files [1] are the de facto standard for storing network capture data. In 
> security and protocol applications, it is very common to want to extract 
> particular packets from a capture for further analysis.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port or by protocol. Beyond that, however, it would be 
> very useful to be able to group packets by TCP session and eventually to look 
> at packet contents. For now, however, the most critical requirement is that 
> we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder 
> that did lazy deserialization and could traverse hundreds of MB of PCAP data 
> per second per core. This compares to roughly 2-3 MB/s for widely available 
> Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a 
> Drill file format.
> Initial work is available at https://github.com/mapr-demos/pcap-query
> [1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5432) Want a memory format for PCAP files

2017-04-12 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-5432:
--

 Summary: Want a memory format for PCAP files
 Key: DRILL-5432
 URL: https://issues.apache.org/jira/browse/DRILL-5432
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Ted Dunning


PCAP files [1] are the de facto standard for storing network capture data. In 
security and protocol applications, it is very common to want to extract 
particular packets from a capture for further analysis.

At a first level, it is desirable to query and filter by source and destination 
IP and port or by protocol. Beyond that, however, it would be very useful to be 
able to group packets by TCP session and eventually to look at packet contents. 
For now, however, the most critical requirement is that we should be able to 
scan captures at very high speed.

I previously wrote a (kind of working) proof of concept for a PCAP decoder that 
did lazy deserialization and could traverse hundreds of MB of PCAP data per 
second per core. This compares to roughly 2-3 MB/s for widely available 
Apache-compatible open source PCAP decoders.

This JIRA covers the integration and extension of that proof of concept as a 
Drill file format.

Initial work is available at https://github.com/mapr-demos/pcap-query


[1] https://en.wikipedia.org/wiki/Pcap



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4884) Drill produced IOB exception while querying data of 65536 limitation using non batched reader

2016-10-24 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603623#comment-15603623
 ] 

Ted Dunning commented on DRILL-4884:


Hmm putting four copies of my parquet file into a directory made no 
difference.

Can't seem to replicate this.



> Drill produced IOB exception while querying data of 65536 limitation using 
> non batched reader
> -
>
> Key: DRILL-4884
> URL: https://issues.apache.org/jira/browse/DRILL-4884
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: CentOS 6.5 / JAVA 8
>Reporter: Hongze Zhang
>Assignee: Jinfeng Ni
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Drill produces IOB while using a non batched scanner and limiting SQL by 
> 65536.
> SQL:
> {noformat}
> select id from xx limit 1 offset 65535
> {noformat}
> Result:
> {noformat}
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:324)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [classes/:na]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [classes/:na]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.IndexOutOfBoundsException: index: 131072, length: 2 
> (expected: range(0, 131072))
>   at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:175) 
> ~[classes/:4.0.27.Final]
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:197) 
> ~[classes/:4.0.27.Final]
>   at io.netty.buffer.DrillBuf.setChar(DrillBuf.java:517) 
> ~[classes/:4.0.27.Final]
>   at 
> org.apache.drill.exec.record.selection.SelectionVector2.setIndex(SelectionVector2.java:79)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.limitWithNoSV(LimitRecordBatch.java:167)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.doWork(LimitRecordBatch.java:145)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[classes/:na]
>   at 
> 

[jira] [Commented] (DRILL-4884) Drill produced IOB exception while querying data of 65536 limitation using non batched reader

2016-10-24 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603615#comment-15603615
 ] 

Ted Dunning commented on DRILL-4884:


I just did an experiment (which must be flawed) to try to recreate this problem.

First, I created a table:
{code}
drop table maprfs.ted.`q1.parquet`;
create table maprfs.ted.`q1.parquet` as
with x1(a,b) as (values (1, rand()-0.5), (1, rand()-0.5)),
  x2 as (select t1.a as a, t1.b + t2.b + t3.b + t4.b as b from x1 t1, x1 t2, x1 
t3, x1 t4 
 where t1.a = t2.a and t2.a = t3.a and t3.a = t4.a),
  x3 as (select t1.a as a, t1.b + t2.b + t3.b + t4.b as b from x2 t1, x2 t2, x2 
t3, x2 t4 
 where t1.a = t2.a and t2.a = t3.a and t3.a = t4.a) ,
  x4 as (select t1.a as a, t1.b + t2.b + t3.b + t4.b as b from x1 t1, x1 t2, x1 
t3, x3 t4 
 where t1.a = t2.a and t2.a = t3.a and t3.a = t4.a) 
  select * from x4;
{code}

This table has about half a million rows (x1 has 2 rows, x2 has 2^4, x3 has 
16^4 = 65536, x4 has 2 * 2 * 2 * 65,536):
{code}
0: jdbc:drill:> select count(*) from maprfs.ted.`q1.parquet`;
+-+
| EXPR$0  |
+-+
| 524288  |
+-+
{code}

Unfortunately, I can't get Drill to fail using a limit of 65536±1. Or 100,000. 
Or 200,000.

Does the phrase "non batched scanner" somehow magical here? Or do I need to 
have multiple files in a directory?




> Drill produced IOB exception while querying data of 65536 limitation using 
> non batched reader
> -
>
> Key: DRILL-4884
> URL: https://issues.apache.org/jira/browse/DRILL-4884
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: CentOS 6.5 / JAVA 8
>Reporter: Hongze Zhang
>Assignee: Jinfeng Ni
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Drill produces IOB while using a non batched scanner and limiting SQL by 
> 65536.
> SQL:
> {noformat}
> select id from xx limit 1 offset 65535
> {noformat}
> Result:
> {noformat}
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:324)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [classes/:na]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [classes/:na]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.IndexOutOfBoundsException: index: 131072, length: 2 
> (expected: range(0, 131072))
>   at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:175) 
> ~[classes/:4.0.27.Final]
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:197) 
> ~[classes/:4.0.27.Final]
>   at io.netty.buffer.DrillBuf.setChar(DrillBuf.java:517) 
> ~[classes/:4.0.27.Final]
>   at 
> org.apache.drill.exec.record.selection.SelectionVector2.setIndex(SelectionVector2.java:79)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.limitWithNoSV(LimitRecordBatch.java:167)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.doWork(LimitRecordBatch.java:145)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
>  ~[classes/:na]
>   at 
> 

[jira] [Commented] (DRILL-4754) Missing values are not missing

2016-06-26 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350440#comment-15350440
 ] 

Ted Dunning commented on DRILL-4754:



This other bug (from 18 months ago with no apparent progress) notes the 
conflation of empty and missing, but doesn't directly address it.

> Missing values are not missing
> --
>
> Key: DRILL-4754
> URL: https://issues.apache.org/jira/browse/DRILL-4754
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>
> If I have a query which reads from a JSON file where a field is a list or is 
> missing, then the records where the field should missing will instead have a 
> value for that field that is an empty list:
> {code}
> 0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
> +++--+
> | *a*  |   *b*|  *c*   |
> | 3  | [3,2]  | xyz  |
> | 7  | [] | wxy  |
> | 7  | [] | null  |
> +++--+
> 2 rows selected (1.279 seconds)
> {code}
> where the file in question contains these three records:
> {code}
> {'a':3, 'b':[3,2], 'c':'xyz'}
> {'a':7, 'c':'wxy'}
> {"a":7, "b":[]}
> {code}
> The problem is in the second record of the result. I would have expected b to 
> have had the value NULL.
> I am using drill-1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4754) Missing values are not missing

2016-06-26 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350430#comment-15350430
 ] 

Ted Dunning commented on DRILL-4754:



Hmm... I can't find any such JIRA's just off hand.

I see DRILL-3831, but that seems to be a very different matter.

I will look further.


> Missing values are not missing
> --
>
> Key: DRILL-4754
> URL: https://issues.apache.org/jira/browse/DRILL-4754
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>
> If I have a query which reads from a JSON file where a field is a list or is 
> missing, then the records where the field should missing will instead have a 
> value for that field that is an empty list:
> {code}
> 0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
> +++--+
> | *a*  |   *b*|  *c*   |
> | 3  | [3,2]  | xyz  |
> | 7  | [] | wxy  |
> | 7  | [] | null  |
> +++--+
> 2 rows selected (1.279 seconds)
> {code}
> where the file in question contains these three records:
> {code}
> {'a':3, 'b':[3,2], 'c':'xyz'}
> {'a':7, 'c':'wxy'}
> {"a":7, "b":[]}
> {code}
> The problem is in the second record of the result. I would have expected b to 
> have had the value NULL.
> I am using drill-1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4754) Missing values are not missing

2016-06-26 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-4754:
---
Description: 
If I have a query which reads from a JSON file where a field is a list or is 
missing, then the records where the field should missing will instead have a 
value for that field that is an empty list:
{{
0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
+++--+
| *a*  |   *b*|  *c*   |
| 3  | [3,2]  | xyz  |
| 7  | [] | wxy  |
| 7  | [] | null  |
+++--+
2 rows selected (1.279 seconds)
}}
where the file in question contains these two records:
{{
{'a':3, 'b':[3,2], 'c':'xyz'}
{'a':7, 'c':'wxy'}
{"a":7, "b":[]}
}}
The problem is in the second record of the result. I would have expected b to 
have had the value NULL.

I am using drill-1.6.0.




  was:
If I have a query which reads from a JSON file where a field is a list or is 
missing, then the records where the field should missing will instead have a 
value for that field that is an empty list:
{{
0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
+++--+
| *a*  |   b|  c   |
| 3  | [3,2]  | xyz  |
| 7  | [] | wxy  |
| 7  | [] | null  |
+++--+
2 rows selected (1.279 seconds)
}}
where the file in question contains these two records:
{{
{'a':3, 'b':[3,2], 'c':'xyz'}
{'a':7, 'c':'wxy'}
{"a":7, "b":[]}
}}
The problem is in the second record of the result. I would have expected b to 
have had the value NULL.

I am using drill-1.6.0.





> Missing values are not missing
> --
>
> Key: DRILL-4754
> URL: https://issues.apache.org/jira/browse/DRILL-4754
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>
> If I have a query which reads from a JSON file where a field is a list or is 
> missing, then the records where the field should missing will instead have a 
> value for that field that is an empty list:
> {{
> 0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
> +++--+
> | *a*  |   *b*|  *c*   |
> | 3  | [3,2]  | xyz  |
> | 7  | [] | wxy  |
> | 7  | [] | null  |
> +++--+
> 2 rows selected (1.279 seconds)
> }}
> where the file in question contains these two records:
> {{
> {'a':3, 'b':[3,2], 'c':'xyz'}
> {'a':7, 'c':'wxy'}
> {"a":7, "b":[]}
> }}
> The problem is in the second record of the result. I would have expected b to 
> have had the value NULL.
> I am using drill-1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4754) Missing values are not missing

2016-06-26 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-4754:
--

 Summary: Missing values are not missing
 Key: DRILL-4754
 URL: https://issues.apache.org/jira/browse/DRILL-4754
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


If I have a query which reads from a JSON file where a field is a list or is 
missing, then the records where the field should missing will instead have a 
value for that field that is an empty list:

0: jdbc:drill:> select * from maprfs.ted.`bug.json`;
+++--+
| a  |   b|  c   |
+++--+
| 3  | [3,2]  | xyz  |
| 7  | [] | wxy  |
+++--+
2 rows selected (1.279 seconds)

where the file in question contains these two records:

{'a':3, 'b':[3,2], 'c':'xyz'}
{'a':7, 'c':'wxy'}

The problem is in the second record of the result. I would have expected b to 
have had the value NULL.

I am using drill-1.6.0.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3912) Common subexpression elimination

2015-10-07 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947721#comment-14947721
 ] 

Ted Dunning commented on DRILL-3912:


It sounds like this only deals with common sub-expressions in expressions.

A far more significant optimization would be to deal with common 
sub-expressions at a larger scale.  A classic case is multiple re-use of a 
single expression in a common table expression.  For instance,

{code}
with x as (select dir0, id from dfs.tdunning.zoom where id < 12),  
   y as (select id, count(*) cnt from x group by id),
   z as (select count(distinct id) id_count from x)
select dir0, x.id, y.cnt from x , y, z  where x.id = y.id and y.cnt / 
z.id_count >  3
{code}

Without good sub-expression elimination, table zoom will be scanned three 
times. Last I heard, DRILL doesn't optimize this away.

> Common subexpression elimination
> 
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2015-10-05 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943954#comment-14943954
 ] 

Ted Dunning commented on DRILL-3894:


Actually, I just tested a bit with this.  I agree that this is a valid request, 
but there is actually a trivial (but not necessarily obvious work-around).

TLDR: Just use '.' as the table name.

I created a workspace zoom under my home directory as the directory zoom.

>From my home directory, I can use {{MAXDIR}} as expected:
{code}
0: jdbc:drill:> select count(*) from dfs.tdunning.zoom;
+-+
| EXPR$0  |
+-+
| 600 |
+-+
1 row selected (0.378 seconds)
0: jdbc:drill:> select count(*) from dfs.tdunning.zoom where dir0 = 
MAXDIR('dfs.tdunning', 'zoom');
+-+
| EXPR$0  |
+-+
| 200 |
+-+
1 row selected (0.799 seconds)
{code}
So that all works. If I try to touch the zoom work-space, I immediately have 
some issues because a workspace isn't a table.

{code}
0: jdbc:drill:> select count(*) from dfs.zoom;

Error: PARSE ERROR: From line 1, column 22 to line 1, column 24: Table 
'dfs.zoom' not found
{code}
Using the hack of {{`.`}} as a table resolves this, however:

{code}
0: jdbc:drill:> select count(*) from dfs.zoom.`.`;
+-+
| EXPR$0  |
+-+
| 600 |
+-+
1 row selected (0.336 seconds)
0: jdbc:drill:> select count(*) from dfs.zoom.`.` where dir0 = 
maxdir('dfs.zoom', '.');
+-+
| EXPR$0  |
+-+
| 200 |
+-+
1 row selected (0.777 seconds)
0: jdbc:drill:> 
{code}

> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)

2015-09-21 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901691#comment-14901691
 ] 

Ted Dunning commented on DRILL-3815:


Daniel,

Did you trace this a bit to see where the extensions are being matched?

Could it be a naively constructed regex?  Kinda smells like that.



> unknown suffixes .not_json and .json_not treated differently (multi-file case)
> --
>
> Key: DRILL-3815
> URL: https://issues.apache.org/jira/browse/DRILL-3815
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>Assignee: Jacques Nadeau
>
> In scanning a directory subtree used as a table, unknown filename extensions 
> seem to be treated differently depending on whether they're similar to known 
> file extensions.  The behavior suggests that Drill checks whether a file name 
> _contains_ an extension's string rather than _ending_ with it. 
> For example, given these subtrees with almost identical leaf file names:
> {noformat}
> $ find /tmp/testext_xx_json/
> /tmp/testext_xx_json/
> /tmp/testext_xx_json/voter2.not_json
> /tmp/testext_xx_json/voter1.json
> $ find /tmp/testext_json_xx/
> /tmp/testext_json_xx/
> /tmp/testext_json_xx/voter1.json
> /tmp/testext_json_xx/voter2.json_not
> $ 
> {noformat}
> the results of trying to use them as tables differs:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_xx_json`;
> Sep 21, 2015 11:41:50 AM 
> org.apache.calcite.sql.validate.SqlValidatorException 
> ...
> Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 
> 'dfs.tmp.testext_xx_json' not found
> [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_json_xx`;
> +---+
> | onecf |
> +---+
> | {"name":"someName1"}  |
> | {"name":"someName2"}  |
> +---+
> 2 rows selected (0.149 seconds)
> {noformat}
> (Other probing seems to indicate that there is also some sensitivity to 
> whether the extension contains an underscore character.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3698) Expose Show Files Command As SQL for sorting/filtering

2015-08-23 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708633#comment-14708633
 ] 

Ted Dunning commented on DRILL-3698:



I think that making [show tables] into the equivalent of a select query is the 
way to go with this.  If that can work syntactically, then everything you want 
just falls out directly.



 Expose Show Files Command As SQL for sorting/filtering
 --

 Key: DRILL-3698
 URL: https://issues.apache.org/jira/browse/DRILL-3698
 Project: Apache Drill
  Issue Type: Improvement
  Components: SQL Parser
Affects Versions: Future
 Environment: All
Reporter: John Omernik
Assignee: Aman Sinha
  Labels: features
 Fix For: Future


 When using drill, I had a workspace setup, and I found myself using the show 
 files command often to find my directories etc. The thing is, the return of 
 show files is not ordered.  And when looking at file system data there are 
 many possible ways to order the results for efficiency as a user.  
 Consider the ls command in unix; the ability to specify different sorting is 
 built in there.  I checked out 
 http://drill.apache.org/docs/show-files-command/ as well as tried the 
 obvious show files order by name and that didn't work nor did I see how I 
 could in the documentation. 
 Based on a mailing list discussion there is no way to do that currently in 
 Drill, hence this JIRA I think just adding ORDER BY SQL methodology would be 
 perfect here, you have 8 fields (seen below) and ordering by any one of them, 
 or group of them, with ASC/DESC just like standard SQL order by would be a 
 huge win.  
 I suppose one could potentially ask for WHERE clause (filtering)too, and 
 maybe a select (which fieldsto display) however I am more concerned with the 
 order, but if I had to implement all there I could see examples below:
 (All Three, select, where, and order) (I.e. after Files if the token isn't 
 WHERE  or ORDER then check for the fields, if it's not a valid field list 
 error)
 SHOW FILES name, accessTime WHERE name like '%.csv' ORDER BY name;
 (Where clause and order, note the token after FILES is WHERE)
 SHOW FILES WHERE name like '%.csv' ORDER BY length ASC, name DESC;
 (Only Order, ORDER Is the first token after FILES)
 SHOW FILES ORDER BY length ASC, name DESC
 I don't think we have to grant full SQL functionality here (i.e. aggregates), 
 just the ability to display various fields, filter on criteria, and ordering. 
 If you wanted to get fancy, I suppose you could take the table and make it a 
 full on table, i.e. take the results make it a quick inmemory table and then 
 utilize the whole drill stack on it.  Lots of options.  I just wanted to get 
 this down in an email/JIRA as it was something I found myself wishing I had 
 over and over during data exploration. 
 Fields Currently Returned:
 |name| isDirectory|isFile|length|owner 
 group|permissions|accessTime|modificationTime|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3545) Need documentation on BINARY_STRING and STRING_BINARY functions

2015-07-22 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3545:
--

 Summary: Need documentation on BINARY_STRING and STRING_BINARY 
functions
 Key: DRILL-3545
 URL: https://issues.apache.org/jira/browse/DRILL-3545
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


These are darn handy but we need to document them so the community at large can 
find out about them.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3544) Need better error messages when convert_to is given a bad type

2015-07-22 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3544:
--

 Summary: Need better error messages when convert_to is given a bad 
type
 Key: DRILL-3544
 URL: https://issues.apache.org/jira/browse/DRILL-3544
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


The first query below fails because I used UTF-8 instead of UTF8.  This should 
have a decent error message.

{code}
0: jdbc:drill:zk=local SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF-8') AS 
MYCOL1 FROM sys.version;
Error: SYSTEM ERROR: org.apache.drill.exec.work.foreman.ForemanException: 
Unexpected exception during fragment initialization: null

[Error Id: 899207da-2338-4b09-bdc8-8e12e320b661 on 172.16.0.61:31010] 
(state=,code=0)
0: jdbc:drill:zk=local SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS 
MYCOL1 FROM sys.version;
+-+
|   MYCOL1|
+-+
| [B@71f3d3a  |
+-+
1 row selected (0.108 seconds)
0: jdbc:drill:zk=local 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3516) UDF documentation doesn't get people where they need to be

2015-07-20 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3516:
--

 Summary: UDF documentation doesn't get people where they need to be
 Key: DRILL-3516
 URL: https://issues.apache.org/jira/browse/DRILL-3516
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


The UDF documentation on the web side rooted at 
http://drill.apache.org/docs/develop-custom-functions/ does not describe the 
high level process for how UDF's are used, nor does it describe why simple 
things like toString can't work on *Holder data structures.

This leads to huge confusion and frustration on the part of potentially 
contributors.

Here are some pertinent threads:

http://mail-archives.apache.org/mod_mbox/drill-user/201507.mbox/%3CCACAwhF%3DBxs-bXNdrm0pNJ4e8hZiaueqtZMhJ%3DRiBpf%3Dt%3DzEOWA%40mail.gmail.com%3E

http://mail-archives.apache.org/mod_mbox/drill-user/201507.mbox/%3CCACAwhFmwWkP6udc05UEGFTzpEsaRvAxSRKW%2B2Mg-ijYX8QoQxQ%40mail.gmail.com%3E





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3517) Add a UDF development FAQ

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634309#comment-14634309
 ] 

Ted Dunning edited comment on DRILL-3517 at 7/21/15 1:44 AM:
-

8) My UDF is not a pure function so calling it with the same arguments will 
result in different values each time.  Drill is only using the result of 
calling my function once.  How can I fix that?

- Add the isRandom flag to the annotation that defines your class as a 
function:

 @FunctionTemplate(isRandom = true, ...)



was (Author: tdunning):

8) My UDF is not a pure function so calling it with the same arguments will 
result in different values each time.  Drill is only using the result of 
calling my function once.  How can I fix that?



 Add a UDF development FAQ
 -

 Key: DRILL-3517
 URL: https://issues.apache.org/jira/browse/DRILL-3517
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jacques Nadeau
Assignee: Bridget Bevens

 Lets create a UDF FAQ of common issues, log entries, etc so that people know 
 what to do when they hit certain issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3518) Do a better job of providing conceptual overview to UDF creation

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634262#comment-14634262
 ] 

Ted Dunning commented on DRILL-3518:


Pitfalls I have fallen into:

1) All data that a UDF uses must be annotated and must be a type that the 
annotation accepts

2) All class references must be fully qualified

3) It is super-easy to make a UDF that doesn't actually load and it is hard to 
see why (at first)

4) UDAF's can't have complex @Workspace variables because there seems to be no 
way to allocate even a Repeated* value, much less to have a ComplexWriter in 
the @Workspace

5) The annotated input, workspace and output variables have life-cycles that 
aren't apparent from the lexical structure of a UDAF. The fact that at add time 
the add() method can't see the output and that at output time both workspace 
and output variables are visible is confusing.

6) figuring out the maven-fu to create acceptable jars takes quite a while



 Do a better job of providing conceptual overview to UDF creation
 

 Key: DRILL-3518
 URL: https://issues.apache.org/jira/browse/DRILL-3518
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jacques Nadeau
Assignee: Bridget Bevens

 Since UDFs are effectively written in Java, people find it confusing when 
 some Java features aren't supported.  Let's try to do a better job of 
 outlining the pitfalls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3517) Add a UDF development FAQ

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634253#comment-14634253
 ] 

Ted Dunning commented on DRILL-3517:


Let's start in these comments:

1) Drill UDF's are not run in the lexical environment that you might think
  1.a) imports will just confuse you
  1.b) all class references need to be fully qualified
  1.c) fields that you define will get left behind

2) Inputs and outputs have to be in *Holder types

3) complex outputs can be created using the ComplexWriter

4) you need to build both source and binary jars.  Examples available


Not sure if these are FAQ format anymore.


 Add a UDF development FAQ
 -

 Key: DRILL-3517
 URL: https://issues.apache.org/jira/browse/DRILL-3517
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jacques Nadeau
Assignee: Bridget Bevens

 Lets create a UDF FAQ of common issues, log entries, etc so that people know 
 what to do when they hit certain issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3517) Add a UDF development FAQ

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634258#comment-14634258
 ] 

Ted Dunning commented on DRILL-3517:


More in the line of FAQ's:

1) my UDF didn't get loaded.  What happened?
- see the log
- you probably have a subtle error related to the lexical environment

2) my UDF got loaded, but didn't get used.  What happened?
- you may not have the right types/may need more type variants

3) Is there an example of a simple transforming UDF?

4) Is there a sample of a UDAF?

5) How can I use my own types as temporary data in my UDAF?

6) I defined fields in my class that implements a UDF, but I get a compile 
error that says that they are undefined.  Heh?

7) How can I see the generated code that includes my UDF?



 Add a UDF development FAQ
 -

 Key: DRILL-3517
 URL: https://issues.apache.org/jira/browse/DRILL-3517
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jacques Nadeau
Assignee: Bridget Bevens

 Lets create a UDF FAQ of common issues, log entries, etc so that people know 
 what to do when they hit certain issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3520) Provide better logging around UDF loading and module loading

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634267#comment-14634267
 ] 

Ted Dunning commented on DRILL-3520:



I think that the function logging should become much more extensive if there 
are any functions that fail to load.  A list of jars that had failed functions, 
a list that were loaded without error and so on would be very helpful, as would 
a dump of the classpath.  The volume of logging isn't a big deal since this is 
a one-off that only occurs when things are borked.




 Provide better logging around UDF loading and module loading
 

 Key: DRILL-3520
 URL: https://issues.apache.org/jira/browse/DRILL-3520
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau

 When adding an extension to Drill, sometimes it is hard to know what is going 
 on.  We should:
 - improve logging so that we report at INFO level information about all the 
 Jar files that were included in Drill's consideration set (marked) and those 
 that are not. (and include this debug analysis in the documentation)
 - If Drill fails to load a function, register an error function so that 
 trying to invoke the UDF will provide the user with information about why the 
 function failed to load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3517) Add a UDF development FAQ

2015-07-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634309#comment-14634309
 ] 

Ted Dunning commented on DRILL-3517:



8) My UDF is not a pure function so calling it with the same arguments will 
result in different values each time.  Drill is only using the result of 
calling my function once.  How can I fix that?



 Add a UDF development FAQ
 -

 Key: DRILL-3517
 URL: https://issues.apache.org/jira/browse/DRILL-3517
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jacques Nadeau
Assignee: Bridget Bevens

 Lets create a UDF FAQ of common issues, log entries, etc so that people know 
 what to do when they hit certain issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3461) Need to meet basic coding standards

2015-07-06 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-3461:
---
Attachment: no-javadocs.txt

This is a list of the 1220 classes with no javadocs

 Need to meet basic coding standards
 ---

 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
 Attachments: no-javadocs.txt


 1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
 list.
 Some kind of expression of intent and basic place in the architecture should 
 be included in all classes.
 The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
 that have at least some kind of javadocs. 
 I would be happy to help write comments, but I can't figure out what these 
 classes do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3461) Need to meet basic coding standards

2015-07-06 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3461:
--

 Summary: Need to meet basic coding standards
 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
list.

Some kind of expression of intent and basic place in the architecture should be 
included in all classes.

The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
that have at least some kind of javadocs. 

I would be happy to help write comments, but I can't figure out what these 
classes do.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3461) Need to meet basic coding standards

2015-07-06 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-3461:
---
Attachment: no-javadoc-no-comments.txt
no-comments.txt

Here are other views of the situation.  A quick summary is that only one file 
that has no javadoc has any // comments.

 Need to meet basic coding standards
 ---

 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
 Attachments: no-comments.txt, no-javadoc-no-comments.txt, 
 no-javadocs.txt


 1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
 list.
 Some kind of expression of intent and basic place in the architecture should 
 be included in all classes.
 The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
 that have at least some kind of javadocs. 
 I would be happy to help write comments, but I can't figure out what these 
 classes do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3462) There appears to be no way to have complex intermediate state

2015-07-06 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3462:
--

 Summary: There appears to be no way to have complex intermediate 
state
 Key: DRILL-3462
 URL: https://issues.apache.org/jira/browse/DRILL-3462
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


After spending several frustrating days on the problem (see also DRILL-3461), 
it appears that there is no viable idiom for building an aggregator that has 
internal state that is anything more than a scalar.

What is needed is:

1) The ability to allocate a Repeated* type for use in a Workspace variables.  
Currently, new works to get the basic structure, but there is no good way to 
allocate the corresponding vector.

2) The ability to use and to allocate a ComplexWriter in the Workspace 
variables.

3) The ability to write a UDAF that supports multi-phase aggregation.  It would 
be just fine if I simply have to write a combine method on my UDAF class.  I 
don't think that there is any way to infer such a combiner from the parameters 
and workspace variables.  An alternative API would be to have a form of the 
output function that is given an IterableOutputClass, but that is probably 
much less efficient than simply having a combine method that is called 
repeatedly.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3461) Need to add javadocs to class where they are missing

2015-07-06 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-3461:
---
Attachment: (was: no-javadoc-no-comments.txt)

 Need to add javadocs to class where they are missing
 

 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
 Attachments: no-javadocs-templates.txt, no-javadocs.txt, 
 no-javadocs.txt


 1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
 list.
 Some kind of expression of intent and basic place in the architecture should 
 be included in all classes.
 The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
 that have at least some kind of javadocs. 
 I would be happy to help write comments, but I can't figure out what these 
 classes do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3461) Need to add javadocs to class where they are missing

2015-07-06 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-3461:
---
Attachment: (was: no-comments.txt)

 Need to add javadocs to class where they are missing
 

 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
 Attachments: no-javadocs-templates.txt, no-javadocs.txt, 
 no-javadocs.txt


 1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
 list.
 Some kind of expression of intent and basic place in the architecture should 
 be included in all classes.
 The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
 that have at least some kind of javadocs. 
 I would be happy to help write comments, but I can't figure out what these 
 classes do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3461) Need to add javadocs to class where they are missing

2015-07-06 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated DRILL-3461:
---
Attachment: no-javadocs-templates.txt
no-javadocs.txt

Updated lists of files.  Current count is 1239 java files with no javadoc and 
67 templates.  The previous count was possibly distorted by generated files and 
seeing the Apache license as a comment.

 Need to add javadocs to class where they are missing
 

 Key: DRILL-3461
 URL: https://issues.apache.org/jira/browse/DRILL-3461
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning
 Attachments: no-comments.txt, no-javadoc-no-comments.txt, 
 no-javadocs-templates.txt, no-javadocs.txt, no-javadocs.txt


 1220 classes in Drill have no Javadocs whatsoever.  I will attach a detailed 
 list.
 Some kind of expression of intent and basic place in the architecture should 
 be included in all classes.
 The good news is that at least there are 1838 (1868 in 1.1.0 branch) classes 
 that have at least some kind of javadocs. 
 I would be happy to help write comments, but I can't figure out what these 
 classes do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3444) Implement Is Not Null/Is Null on List of objects - [isnotnull(MAP-REPEATED)] error

2015-07-01 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611048#comment-14611048
 ] 

Ted Dunning commented on DRILL-3444:



I talked to Tug about this today and we walked through the code looking at how 
to fix this.

The key lack is a missing IsNull operator.  Tug started in trying to figure out 
how to write such an operator.  Right now we have a bunch of template generated 
operators for all of the specific scalar types and also the uniform list types. 
 What we don't have is a null operator for general lists.

Can somebody point at how such an operator ought to be implemented?  



 Implement Is Not Null/Is Null on List of objects -  [isnotnull(MAP-REPEATED)] 
 error
 ---

 Key: DRILL-3444
 URL: https://issues.apache.org/jira/browse/DRILL-3444
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Data Types, Functions - Drill
 Environment: Drill 1.0
Reporter: Tugdual Grall
Assignee: Daniel Barclay (Drill)
Priority: Critical

 It is not possble to use the IS NULL / IS NOT NULL operator on an attribuite 
 that contains a list of object. (it is working with a list of scalar types)
 Query:
 {code}
 select *
 from dfs.`/working/json_array/*.json` p
 where p.tags IS NOT NULL
 {code}
 Document:
 {code}
 {
   name : PPRODUCT_002,
   price : 200.00,
   tags : [ { type : sports } , { type : ocean }]
 }
 {code}
 Error:
 {code}
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
 org.apache.drill.exec.exception.SchemaChangeException: Failure while trying 
 to materialize incoming schema. Errors: Error in expression at index -1. 
 Error: Missing function implementation: [isnotnull(MAP-REPEATED)]. Full 
 expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 
 384e6b86-ce17-4eb9-b5eb-27870a341c90 on 192.168.99.13:31010]
 {code}
 Workaround:
 By using a sub element it is working, for example:
 {code}
 select *
 from dfs.`/Users/tgrall/working/json_array/*.json` p
 where p.tags.type IS NULL
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3226) Upper and lower casing doesn't work correctly on non-ASCII characters

2015-05-31 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3226:
--

 Summary: Upper and lower casing doesn't work correctly on 
non-ASCII characters
 Key: DRILL-3226
 URL: https://issues.apache.org/jira/browse/DRILL-3226
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


{code}
0: jdbc:drill:zk=local select z, lower(z), upper(z) from 
dfs.root.`/Users/tdunning/tmp/data.json`;
+--+-+-+
|  z   | EXPR$1  | EXPR$2  |
+--+-+-+
| åäö  | åäö | åäö |
| aBc  | abc | ABC |
+--+-+-+
{code}
Expected result would be
{code}
+--+-+-+
|  z   | EXPR$1  | EXPR$2  |
+--+-+-+
| åäö  | åäö | ÅÄÖ |
| aBc  | abc | ABC |
+--+-+-+
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3222) Need a zip function to combine coordinated lists

2015-05-30 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3222:
--

 Summary: Need a zip function to combine coordinated lists
 Key: DRILL-3222
 URL: https://issues.apache.org/jira/browse/DRILL-3222
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


It is often very useful to be able to turn a pair (or more) of lists into a 
single list of pairs.  Thus zip([a,b], [1,2]) = [[a,1], [b,2]].

The handling of short lists, more than two lists and so on is TBD, but the base 
function is an important one.

One use case is in time series where storing times as one list and values as 
another is very handy but processing these results would be much better done by 
using flatten(zip(times, values)).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3164) Compilation fails with Java 8

2015-05-21 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-3164:
--

 Summary: Compilation fails with Java 8
 Key: DRILL-3164
 URL: https://issues.apache.org/jira/browse/DRILL-3164
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning


I just got this:
{code}
ted:drill[1.0.0*]$ mvn package -DskipTests
...
Detected JDK Version: 1.8.0-40 is not in the allowed range [1.7,1.8).
...
{code}
Clearly there is an overly restrictive pattern at work.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2620) Casting to float is changing the value slightly

2015-03-30 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386941#comment-14386941
 ] 

Ted Dunning commented on DRILL-2620:


What did you expect to see?

In SQL the default precision of a FLOAT is implementation defined.  I strongly 
suspect that in Drill the default is 24 (i.e. single precision).  If you care 
(and you seem to), you might be better served by specifying DOUBLE as the type 
or FLOAT(53).

Single precision floating point (aka float) only provides 6 digits of 
precision.  You, as the lucky person you are, got 7.

http://en.wikipedia.org/wiki/Single-precision_floating-point_format

 Casting to float is changing the value slightly
 ---

 Key: DRILL-2620
 URL: https://issues.apache.org/jira/browse/DRILL-2620
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)

 git.commit.id.abbrev=c11fcf7
 Data Set :
 {code}
 2345552345.5342
 4784.5735
 {code}
 Drill Query :
 {code}
 select cast(columns[0] as float) from `abc.tbl`;
 ++
 |   EXPR$0   |
 ++
 | 2.34555238E9 |
 | 4784.5737  |
 ++
 {code}
 I am not sure whether this is a known limitation or a bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1918) Drill does very obscure things when a JRE is used instead of a JDK.

2015-01-01 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262674#comment-14262674
 ] 

Ted Dunning commented on DRILL-1918:


oops.  Excess of zeal in creating JIRA's

 Drill does very obscure things when a JRE is used instead of a JDK.
 ---

 Key: DRILL-1918
 URL: https://issues.apache.org/jira/browse/DRILL-1918
 Project: Apache Drill
  Issue Type: Bug
Reporter: Ted Dunning

 In 
 http://answers.mapr.com/questions/161911/apache-drill-07-errors.html#comment-161933
  a user describes the consequences of running Drill with a JRE instead of a 
 JDK.  
 Surely this could be detected and this confusion could be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)