[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979839#comment-16979839 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429668 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/schema/PcapTypes.java ## @@ -22,5 +22,6 @@ INTEGER, STRING, LONG, - TIMESTAMP + TIMESTAMP, + DURATION Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979837#comment-16979837 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429600 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ + + private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd'T'HH:mm:ss.SSS"); + + @BeforeClass + public static void setup() throws Exception { +ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); + +PcapFormatConfig sampleConfig = new PcapFormatConfig(); +sampleConfig.sessionizeTCPStreams = true; + +cluster.defineFormat("cp", "pcap", sampleConfig); +dirTestWatcher.copyResourceToRoot(Paths.get("store/pcap/")); + } + + @Test + public void testSessionizedStarQuery() throws Exception { +String sql = "SELECT * FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; Review comment: Fixed with `WHERE` clause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979833#comment-16979833 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429427 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a successful opening
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979835#comment-16979835 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429564 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ + + private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd'T'HH:mm:ss.SSS"); + + @BeforeClass + public static void setup() throws Exception { +ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); + +PcapFormatConfig sampleConfig = new PcapFormatConfig(); +sampleConfig.sessionizeTCPStreams = true; + +cluster.defineFormat("cp", "pcap", sampleConfig); +dirTestWatcher.copyResourceToRoot(Paths.get("store/pcap/")); + } + + @Test + public void testSessionizedStarQuery() throws Exception { +String sql = "SELECT * FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; + +testBuilder() + .sqlQuery(sql) + .ordered() + .baselineColumns("session_start_time", "session_end_time", "session_duration", "total_packet_count", "connection_time", "src_ip", "dst_ip", "src_port", "dst_port", +"src_mac_address", "dst_mac_address", "tcp_session", "is_corrupt", "data_from_originator", "data_from_remote", "data_volume_from_origin", +"data_volume_from_remote", "packet_count_from_origin", "packet_count_from_remote") + .baselineValues(LocalDateTime.parse("2009-04-20T03:28:28.374", formatter), +LocalDateTime.parse("2009-04-20T03:28:28.508", formatter), +Period.parse("PT0.134S"), 4, +Period.parse("PT0.119S"), +"98.114.205.102", +"192.150.11.111", +1821, 445, +"00:08:E2:3B:56:01", +"00:30:48:62:4E:4A", +-8791568836279708938L, +false, +"I>...>..Ib...<...<..I>...>", "", 62,0, 3, 1) + .go(); + } + + @Test + public void testSessionizedSpecificQuery() throws Exception { +String sql = "SELECT session_start_time, session_end_time,session_duration, total_packet_count, connection_time, src_ip, dst_ip, src_port, dst_port, src_mac_address, dst_mac_address, tcp_session, " + + "is_corrupt, data_from_originator, data_from_remote, data_volume_from_origin, data_volume_from_remote, packet_count_from_origin, packet_count_from_remote " + + "FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; + +testBuilder() + .sqlQuery(sql) + .ordered() + .baselineColumns("session_start_time", "session_end_time", "session_duration", "total_packet_count", "connection_time", "src_ip", "dst_ip", "src_port", "dst_port", +"src_mac_address", "dst_mac_address", "tcp_session", "is_corrupt", "data_from_originator", "data_from_remote", "data_volume_from_origin", +"data_volume_from_remote", "packet_count_from_origin", "packet_count_from_remote") + .baselineValues(LocalDateTime.parse("2009-04-20T03:28:28.374", formatter), +LocalDateTime.parse("2009-04-20T03:28:28.508", formatter), +Period.parse("PT0.134S"), 4, +Period.parse("PT0.119S"), +"98.114.205.102", +"192.150.11.111", +1821, 445, +"00:08:E2:3B:56:01", +"00:30:48:62:4E:4A", +-8791568836279708938L, +false, +"I>...>..Ib...<...<..I>...>", "", 62,0, 3, 1) + .go(); + } + + @Test + public void testSerDe() throws Exception { +String
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979832#comment-16979832 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429343 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979831#comment-16979831 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429318 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a successful opening
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979830#comment-16979830 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429229 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a successful opening
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979828#comment-16979828 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429030 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; + + boolean synAck = false; + + boolean ack = false; + + boolean finAck = false; + + boolean isConnected = false; + + boolean isClosed = false; + + long sessionID; + + State currentSessionState = State.NONE; + + enum State { Review comment: Done and done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979827#comment-16979827 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428999 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; Review comment: After some additional code cleanup, I realized they weren't being used. ;-) They're removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979829#comment-16979829 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349429097 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; + + boolean synAck = false; + + boolean ack = false; + + boolean finAck = false; + + boolean isConnected = false; + + boolean isClosed = false; + + long sessionID; + + State currentSessionState = State.NONE; + + enum State { +NONE, OPEN, CLOSED, CLOSE_WAIT, TIME_WAIT, SYN, SYNACK, FORCED_CLOSED, FIN_WAIT + } + + /** + * Returns true for a correct TCP handshake: SYN|SYNACK|ACK, False if not. + * + * @return boolean true if the session is open, false if not. + */ + public boolean isConnected() { +return isConnected; + } + + /** + * This function returns true if the session is closed properly via FIN -> FIN ACK, false if not. Review comment: Corrected This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979825#comment-16979825 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428861 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -21,6 +21,7 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import javax.annotation.Nonnull; Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979826#comment-16979826 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428937 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -491,4 +528,9 @@ private int getPort(int offset) { int dstPortOffset = ipOffset + getIPHeaderLength() + offset; return convertShort(raw, dstPortOffset); } + + @Override + public int compareTo(@Nonnull Packet o) { Review comment: Removed annotation and added Javadoc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979824#comment-16979824 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428748 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java ## @@ -190,6 +199,8 @@ public boolean next() { @Override public void close() { +logger.warn("Unclosed sessions remaining in PCAP"); Review comment: I added an if statement here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979823#comment-16979823 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428683 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java ## @@ -190,6 +199,8 @@ public boolean next() { @Override public void close() { +logger.warn("Unclosed sessions remaining in PCAP"); Review comment: This warning happens in the event of an incomplete or corrupt PCAP file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979822#comment-16979822 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349428496 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java ## @@ -116,64 +118,71 @@ private ScalarWriter isCorruptWriter; + private PcapReaderConfig readerConfig; + + + // Writers for TCP Sessions + private ScalarWriter sessionStartTimeWriter; + + private ScalarWriter sessionEndTimeWriter; + + private ScalarWriter sessionDurationWriter; + + private ScalarWriter connectionTimeWriter; + + private ScalarWriter packetCountWriter; + + private ScalarWriter originPacketCounterWriter; + + private ScalarWriter remotePacketCounterWriter; + + private ScalarWriter originDataVolumeWriter; + + private ScalarWriter remoteDataVolumeWriter; + + private ScalarWriter hostDataWriter; + + private ScalarWriter remoteDataWriter; + + + private HashMap sessionQueue; Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979485#comment-16979485 ] ASF GitHub Bot commented on DRILL-7453: --- arina-ielchiieva commented on issue #1905: DRILL-7453: Update joda-time to 2.10.5 to have correct time zone info URL: https://github.com/apache/drill/pull/1905#issuecomment-55729 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7453: Labels: ready-to-commit (was: ) > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979473#comment-16979473 ] ASF GitHub Bot commented on DRILL-7453: --- KazydubB commented on pull request #1905: DRILL-7453: Update joda-time to 2.10.5 to have correct time zone info URL: https://github.com/apache/drill/pull/1905 Jira - [DRILL-7453](https://issues.apache.org/jira/browse/DRILL-7453) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7453: Affects Version/s: 1.16.0 > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7453: Reviewer: Arina Ielchiieva > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7453: Fix Version/s: 1.17.0 > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
[ https://issues.apache.org/jira/browse/DRILL-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7453: Issue Type: Task (was: Bug) > Update joda-time to 2.10.5 to have correct time zone info > - > > Key: DRILL-7453 > URL: https://issues.apache.org/jira/browse/DRILL-7453 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > > As Brazil decided not to follow the DST changes for 2019 > (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update > joda-time to the latest {{2.10.5}} version which contains the most recent > dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info
Bohdan Kazydub created DRILL-7453: - Summary: Update joda-time to 2.10.5 to have correct time zone info Key: DRILL-7453 URL: https://issues.apache.org/jira/browse/DRILL-7453 Project: Apache Drill Issue Type: Bug Reporter: Bohdan Kazydub Assignee: Bohdan Kazydub As Brazil decided not to follow the DST changes for 2019 (https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update joda-time to the latest {{2.10.5}} version which contains the most recent dbtz info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979433#comment-16979433 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349216788 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; Review comment: I guess this should have been done before submitting the PR... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979432#comment-16979432 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349216051 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; Review comment: I ended up removing all these because they aren't necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-6540) Upgrade to HADOOP-3.0 libraries
[ https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979387#comment-16979387 ] ASF GitHub Bot commented on DRILL-6540: --- arina-ielchiieva commented on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries URL: https://github.com/apache/drill/pull/1895#issuecomment-551847835 By @vvysotskyi: Here is the list of things to be done before merging this PR: - [x] Cleanup poms: - - [x] Remove commented out lines, revise new dependencies to reduce JDBC driver size - - [x] Revert removing excluding `commons-logging`, it will be better to add an absent mock class from `commons-logging` to Drill - [x] Remove change with logging test error in `ITTestShadedJar.java` - [x] Test Drill on Hadoop 3 cluster (secure, non-secure modes) - [x] Check with Drill-on-YARN - [x] Test Drill in embedded mode on Windows (test Hadoop win-utils) - [x] Test JDBC client with new Hadoop version - [x] Check whether new Hadoop version API is compatible with the current one, in the case if not, decide whether to introduce new profile, use a property to specify the version during building or document supported versions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade to HADOOP-3.0 libraries > > > Key: DRILL-6540 > URL: https://issues.apache.org/jira/browse/DRILL-6540 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.18.0 > > > Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, > hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, > hadoop-yarn-client). > A year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] > was released and recently it was updated to [Hadoop > 3.2.0|https://hadoop.apache.org/docs/r3.2.0/]. > To use Drill under Hadoop3.0 distribution we need this upgrade. Also the > newer version includes new features, which can be useful for Drill. > This upgrade is also needed to leverage the newest version of Zookeeper > libraries and Hive 3.1 version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979300#comment-16979300 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349101235 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java ## @@ -136,7 +136,7 @@ public void close() throws Exception { } private ImmutableList getProjectedColsIfItNull() { -return projectedCols != null ? projectedCols : initCols(new Schema()); +return projectedCols != null ? projectedCols : initCols(new Schema(false)); Review comment: Actually, never mind. I'm deleting this class, because it was replaced by the `PcapBatchReader` when I updated this to use EVF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979295#comment-16979295 ] ASF GitHub Bot commented on DRILL-7443: --- cgivre commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349098515 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java ## @@ -136,7 +136,7 @@ public void close() throws Exception { } private ImmutableList getProjectedColsIfItNull() { -return projectedCols != null ? projectedCols : initCols(new Schema()); +return projectedCols != null ? projectedCols : initCols(new Schema(false)); Review comment: I don't know. I'm going to deprecate this class because it actually is no longer needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979208#comment-16979208 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349043549 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/schema/PcapTypes.java ## @@ -22,5 +22,6 @@ INTEGER, STRING, LONG, - TIMESTAMP + TIMESTAMP, + DURATION Review comment: Please add new line in the end of file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7443: Reviewer: Arina Ielchiieva > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979207#comment-16979207 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349043436 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java ## @@ -136,7 +136,7 @@ public void close() throws Exception { } private ImmutableList getProjectedColsIfItNull() { -return projectedCols != null ? projectedCols : initCols(new Schema()); +return projectedCols != null ? projectedCols : initCols(new Schema(false)); Review comment: Please add comment explaining why it is always false here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979194#comment-16979194 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349039869 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -21,6 +21,7 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import javax.annotation.Nonnull; Review comment: Please do not use javax annotations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979197#comment-16979197 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349039044 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java ## @@ -190,6 +199,8 @@ public boolean next() { @Override public void close() { +logger.warn("Unclosed sessions remaining in PCAP"); Review comment: Could you please explain this warning? How it can be avoided? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979203#comment-16979203 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349040898 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; + + boolean synAck = false; + + boolean ack = false; + + boolean finAck = false; + + boolean isConnected = false; + + boolean isClosed = false; + + long sessionID; + + State currentSessionState = State.NONE; + + enum State { +NONE, OPEN, CLOSED, CLOSE_WAIT, TIME_WAIT, SYN, SYNACK, FORCED_CLOSED, FIN_WAIT + } + + /** + * Returns true for a correct TCP handshake: SYN|SYNACK|ACK, False if not. + * + * @return boolean true if the session is open, false if not. + */ + public boolean isConnected() { +return isConnected; + } + + /** + * This function returns true if the session is closed properly via FIN -> FIN ACK, false if not. Review comment: ```suggestion * This method returns true if the session is closed properly via FIN -> FIN ACK, false if not. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979198#comment-16979198 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349040725 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; + + boolean synAck = false; + + boolean ack = false; + + boolean finAck = false; + + boolean isConnected = false; + + boolean isClosed = false; + + long sessionID; + + State currentSessionState = State.NONE; + + enum State { Review comment: 1. Please add java doc and describe each state. 2. Move enum to the end of the class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979204#comment-16979204 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349042446 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ Review comment: ```suggestion public class TestSessionizePCAP extends ClusterTest { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979201#comment-16979201 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349042959 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ + + private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd'T'HH:mm:ss.SSS"); + + @BeforeClass + public static void setup() throws Exception { +ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); + +PcapFormatConfig sampleConfig = new PcapFormatConfig(); +sampleConfig.sessionizeTCPStreams = true; + +cluster.defineFormat("cp", "pcap", sampleConfig); +dirTestWatcher.copyResourceToRoot(Paths.get("store/pcap/")); + } + + @Test + public void testSessionizedStarQuery() throws Exception { +String sql = "SELECT * FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; Review comment: Limit 1 does not guarantee which line will be output use where clause to extract exactly expected line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979205#comment-16979205 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349042181 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979196#comment-16979196 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349040487 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpHandshake.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +/** + * This class is used to record the status of the TCP Handshake. Initially this is used just to determine whether a session is open or closed, but + * future functionality could include SYN flood identification, or other hackery with TCP flags. + */ +public class TcpHandshake { + boolean syn = false; Review comment: Class variables should be private and accessed through getters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979202#comment-16979202 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349041414 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979193#comment-16979193 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349038567 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java ## @@ -116,64 +118,71 @@ private ScalarWriter isCorruptWriter; + private PcapReaderConfig readerConfig; + + + // Writers for TCP Sessions + private ScalarWriter sessionStartTimeWriter; + + private ScalarWriter sessionEndTimeWriter; + + private ScalarWriter sessionDurationWriter; + + private ScalarWriter connectionTimeWriter; + + private ScalarWriter packetCountWriter; + + private ScalarWriter originPacketCounterWriter; + + private ScalarWriter remotePacketCounterWriter; + + private ScalarWriter originDataVolumeWriter; + + private ScalarWriter remoteDataVolumeWriter; + + private ScalarWriter hostDataWriter; + + private ScalarWriter remoteDataWriter; + + + private HashMap sessionQueue; Review comment: Map This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979195#comment-16979195 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349039991 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -491,4 +528,9 @@ private int getPort(int offset) { int dstPortOffset = ipOffset + getIPHeaderLength() + offset; return convertShort(raw, dstPortOffset); } + + @Override + public int compareTo(@Nonnull Packet o) { Review comment: Please remove annotation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable PCAP Plugin to Reassemble TCP Streams > > > Key: DRILL-7443 > URL: https://issues.apache.org/jira/browse/DRILL-7443 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.17.0 > > > One common task in network forensics is reassembling TCP streams from > captured network data. This PR adds this capability to Drill. > h2. Usage > To enable TCP re-sessionization, in the configuration for the PCAP reader, > simply set the variable: {{sessionizeTCPStreams}} to {{true}}. > This can also be accomplished at query time by using the {{table()}} method. > {{SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', > sessionizeTCPStreams=> true))}} > h3. Results > *When this option is enabled, Drill will ignore all packets that are not TCP > packets.* > Executing a query with this option enables changes the results Drill will > return from PCAP files. > You will get the following columns: > * session_start_time: The start time of the session > * session_end_time: The ending time of the session > * session_duration: The duration of the session. This will be a Drill PERIOD > datatype. > * total_packet_count: The number of packets in the session > * connection_time: The amount of time it took for the TCP handshake to be > completed. Useful for network diagnostics > * src_ip: The IP address of the initiating machine > * dst_ip: The IP address of the remote machine > * src_port: The port of the originating machine > * dst_port: The port of the remote machine > * src_mac_address: The MAC address of the originating machine > * dst_mac_address: The MAC address of the remote machine > * tcp_session: This is the session hash for the TCP session. (Long) > * is_corrupt: True/false if the session contains corrupted packets > * data_from_originator: The data sent from the originator > * data_from_remote: The data sent from the remote machine > * data_volume_from_remote: The number of bytes sent from the remote host > * data_volume_from_origin: The number of bytes sent from the originating > machine > * packet_count_from_origin: The number of packets sent from the originating > machine > * packet_count_from_remote: The number of packets sent from the remote > machine > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979199#comment-16979199 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349042748 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestSessionizePCAP.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap; + +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.joda.time.Period; +import java.nio.file.Paths; +import java.time.LocalDateTime; +import org.junit.BeforeClass; +import org.junit.Test; +import java.time.format.DateTimeFormatter; + +import static org.junit.Assert.assertEquals; + +public class TestSessionizePCAP extends ClusterTest{ + + private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd'T'HH:mm:ss.SSS"); + + @BeforeClass + public static void setup() throws Exception { +ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); + +PcapFormatConfig sampleConfig = new PcapFormatConfig(); +sampleConfig.sessionizeTCPStreams = true; + +cluster.defineFormat("cp", "pcap", sampleConfig); +dirTestWatcher.copyResourceToRoot(Paths.get("store/pcap/")); + } + + @Test + public void testSessionizedStarQuery() throws Exception { +String sql = "SELECT * FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; + +testBuilder() + .sqlQuery(sql) + .ordered() + .baselineColumns("session_start_time", "session_end_time", "session_duration", "total_packet_count", "connection_time", "src_ip", "dst_ip", "src_port", "dst_port", +"src_mac_address", "dst_mac_address", "tcp_session", "is_corrupt", "data_from_originator", "data_from_remote", "data_volume_from_origin", +"data_volume_from_remote", "packet_count_from_origin", "packet_count_from_remote") + .baselineValues(LocalDateTime.parse("2009-04-20T03:28:28.374", formatter), +LocalDateTime.parse("2009-04-20T03:28:28.508", formatter), +Period.parse("PT0.134S"), 4, +Period.parse("PT0.119S"), +"98.114.205.102", +"192.150.11.111", +1821, 445, +"00:08:E2:3B:56:01", +"00:30:48:62:4E:4A", +-8791568836279708938L, +false, +"I>...>..Ib...<...<..I>...>", "", 62,0, 3, 1) + .go(); + } + + @Test + public void testSessionizedSpecificQuery() throws Exception { +String sql = "SELECT session_start_time, session_end_time,session_duration, total_packet_count, connection_time, src_ip, dst_ip, src_port, dst_port, src_mac_address, dst_mac_address, tcp_session, " + + "is_corrupt, data_from_originator, data_from_remote, data_volume_from_origin, data_volume_from_remote, packet_count_from_origin, packet_count_from_remote " + + "FROM cp.`/store/pcap/attack-trace.pcap` LIMIT 1"; + +testBuilder() + .sqlQuery(sql) + .ordered() + .baselineColumns("session_start_time", "session_end_time", "session_duration", "total_packet_count", "connection_time", "src_ip", "dst_ip", "src_port", "dst_port", +"src_mac_address", "dst_mac_address", "tcp_session", "is_corrupt", "data_from_originator", "data_from_remote", "data_volume_from_origin", +"data_volume_from_remote", "packet_count_from_origin", "packet_count_from_remote") + .baselineValues(LocalDateTime.parse("2009-04-20T03:28:28.374", formatter), +LocalDateTime.parse("2009-04-20T03:28:28.508", formatter), +Period.parse("PT0.134S"), 4, +Period.parse("PT0.119S"), +"98.114.205.102", +"192.150.11.111", +1821, 445, +"00:08:E2:3B:56:01", +"00:30:48:62:4E:4A", +-8791568836279708938L, +false, +"I>...>..Ib...<...<..I>...>", "", 62,0, 3, 1) + .go(); + } + + @Test + public void testSerDe() throws Exception { +
[jira] [Commented] (DRILL-7443) Enable PCAP Plugin to Reassemble TCP Streams
[ https://issues.apache.org/jira/browse/DRILL-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979200#comment-16979200 ] ASF GitHub Bot commented on DRILL-7443: --- arina-ielchiieva commented on pull request #1898: DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams URL: https://github.com/apache/drill/pull/1898#discussion_r349041514 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/TcpSession.java ## @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.pcap.decoder; + +import org.joda.time.Instant; +import org.joda.time.Period; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetAddress; +import java.util.ArrayList; +import java.util.Collections; + +import static org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII; + +/** + * This class is the representation of a TCP session. + */ +public class TcpSession { + + private ArrayList packetsFromSender; + private ArrayList packetsFromReceiver; + + private long startTime; + private long endTime; + private long sessionLength; + private int packetCount; + private InetAddress srcIP; + private InetAddress dstIP; + private int srcPort; + private int dstPort; + private String srcMac; + private String dstMac; + private long sessionID; + private TcpHandshake handshake; + private long synTime; + private long ackTime; + private long connectTime; + private byte[] sentData; + private byte[] receivedData; + private int sentDataSize; + private int receivedDataSize; + private boolean hasCorruptedData = false; + + + private static final Logger logger = LoggerFactory.getLogger(TcpSession.class); + + public TcpSession (long sessionID) { +packetsFromSender = new ArrayList<>(); +packetsFromReceiver = new ArrayList<>(); + +handshake = new TcpHandshake(); +this.sessionID = sessionID; + } + + /** + * This function adds a packet to the TCP session. + * @param p The Packet to be added to the session + */ + public void addPacket(Packet p) { + +// Only attempt to add TCP packets to session +if (!p.getPacketType().equalsIgnoreCase("TCP")) { + return; +} + +// These variables should be consistent within a TCP session +if (packetCount == 0) { + srcIP = p.getSrc_ip(); + dstIP = p.getDst_ip(); + + srcPort = p.getSrc_port(); + dstPort = p.getDst_port(); + + srcMac = p.getEthernetSource(); + dstMac = p.getEthernetDestination(); + startTime = p.getTimestamp(); +} else if (p.getSessionHash() != sessionID) { + logger.warn("Attempting to add session {} to incorrect TCP session.", sessionID); + return; +} + +// Add packet to appropriate list and increment the data size counter +if (p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { + packetsFromSender.add(p); + // Increment the data size counters + if (p.getData() != null) { +sentDataSize += p.getData().length; + } + +} else { + packetsFromReceiver.add(p); + if (p.getData() != null) { +receivedDataSize += p.getData().length; + } +} + +// Check flags if connection is not established +if (!handshake.isConnected()) { + if (p.getSynFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// This is part 1 of the TCP session handshake +// The host sends the first SYN packet +handshake.syn = true; +handshake.setSyn(); +synTime = p.getTimestamp(); + } else if (p.getSynFlag() && p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(dstIP.getHostAddress())) { +// This condition represents the second part of the TCP Handshake, +// where the receiver sends a frame with the SYN/ACK flags set to the originator +handshake.synAck = true; +handshake.setAck(); + } else if (p.getAckFlag() && p.getSrc_ip().getHostAddress().equalsIgnoreCase(srcIP.getHostAddress())) { +// Finally, this condition represents a
[jira] [Created] (DRILL-7452) Support comparison operator for Array
benj created DRILL-7452: --- Summary: Support comparison operator for Array Key: DRILL-7452 URL: https://issues.apache.org/jira/browse/DRILL-7452 Project: Apache Drill Issue Type: Wish Components: Functions - Drill Affects Versions: 1.16.0 Reporter: benj Attachments: example_array.parquet It will be useful to have a comparison operator for nested types, at less for Array. sample file in attachment : example_array.parquet {code:sql} /* It's possible to do */ apache drill(1.16)> SELECT id, tags FROM `example_array.parquet`; +++ | id |tags| +++ | 7b8808 | [1,2,3]| | 7b8808 | [1,20,3] | | 55a4be | [1,3,5,6] | +++ /* But it's not possible to use DISTINCT or ORDER BY on the field Tags (ARRAY) */ /* https://drill.apache.org/docs/nested-data-limitations/ */ apache drill(1.16)> SELECT DISTINCT id, tags FROM `example_array_parquet` ORDER BY tags; Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between BIGINT:REPEATED and BIGINT:REPEATED. {code} It's possible to do that in Postgres {code:sql} SELECT DISTINCT id, tags FROM ( SELECT '7b8808' AS id, ARRAY[1,2,3] tags UNION SELECT '7b8808', ARRAY[1,20,3] UNION SELECT '55a4be', ARRAY[1,3,5,6] ) x ORDER BY tags 7b8808;{1,2,3} 55a4be;{1,3,5,6} 7b8808;{1,20,3} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7379) Planning error
[ https://issues.apache.org/jira/browse/DRILL-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj updated DRILL-7379: Description: sample file: [^example.parquet] With data as: {code:sql} SELECT id, tags FROM `example_parquet`; +++ | id |tags| +++ | 7b8808 | ["peexe","signed","overlay"] | | 55a4ae | ["peexe","signed","upx","overlay"] | +++ {code} The next request is OK {code:sql} SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) LIMIT 2; +++ | id | tag | +++ | 55a4ae | peexe | | 55a4ae | signed | +++ {code} But unexpectedly, the next query failed: {code:sql} SELECT tag, count(*) FROM ( SELECT flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) ) GROUP BY tag; Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED. /* Or other error with another set of data : Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index 0. Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null.. */ {code} These errors are incomprehensible because, the aggregate is on VARCHAR. More, the request works if decomposed in 2 request with with the creation of an intermediate table like below: {code:sql} CREATE TABLE `tmp.parquet` AS ( SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id )); SELECT tag, count(*) c FROM `tmp_parquet` GROUP BY tag; +-+---+ | tag | c | +-+---+ | overlay | 2 | | peexe | 2 | | signed | 2 | | upx | 1 | +-+---+ {code} was: With data as: {code:sql} SELECT id, tags FROM `example_parquet`; +++ | id |tags| +++ | 7b8808 | ["peexe","signed","overlay"] | | 55a4ae | ["peexe","signed","upx","overlay"] | +++ {code} The next request is OK {code:sql} SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) LIMIT 2; +++ | id | tag | +++ | 55a4ae | peexe | | 55a4ae | signed | +++ {code} But unexpectedly, the next query failed: {code:sql} SELECT tag, count(*) FROM ( SELECT flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) ) GROUP BY tag; Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED. /* Or other error with another set of data : Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index 0. Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null.. */ {code} These errors are incomprehensible because, the aggregate is on VARCHAR. More, the request works if decomposed in 2 request with with the creation of an intermediate table like below: {code:sql} CREATE TABLE `tmp.parquet` AS ( SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id )); SELECT tag, count(*) c FROM `tmp_parquet` GROUP BY tag; +-+---+ | tag | c | +-+---+ | overlay | 2 | | peexe | 2 | | signed | 2 | | upx | 1 | +-+---+ {code} > Planning error > -- > > Key: DRILL-7379 > URL: https://issues.apache.org/jira/browse/DRILL-7379 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Priority: Major > Attachments: example.parquet > > > sample file: [^example.parquet] > With data as: > {code:sql} > SELECT id, tags FROM `example_parquet`; > +++ > | id |tags| > +++ > | 7b8808 | ["peexe","signed","overlay"] | > | 55a4ae | ["peexe","signed","upx","overlay"] | > +++ >
[jira] [Updated] (DRILL-7379) Planning error
[ https://issues.apache.org/jira/browse/DRILL-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj updated DRILL-7379: Attachment: example.parquet > Planning error > -- > > Key: DRILL-7379 > URL: https://issues.apache.org/jira/browse/DRILL-7379 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Priority: Major > Attachments: example.parquet > > > With data as: > {code:sql} > SELECT id, tags FROM `example_parquet`; > +++ > | id |tags| > +++ > | 7b8808 | ["peexe","signed","overlay"] | > | 55a4ae | ["peexe","signed","upx","overlay"] | > +++ > {code} > The next request is OK > {code:sql} > SELECT id, flatten(tags) tag > FROM ( > SELECT id, any_value(tags) tags > FROM `example_parquet` > GROUP BY id > ) LIMIT 2; > +++ > | id | tag | > +++ > | 55a4ae | peexe | > | 55a4ae | signed | > +++ > {code} > But unexpectedly, the next query failed: > {code:sql} > SELECT tag, count(*) > FROM ( > SELECT flatten(tags) tag > FROM ( > SELECT id, any_value(tags) tags > FROM `example_parquet` > GROUP BY id > ) > ) GROUP BY tag; > Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or > repeated scalar type should not be used in group by, order by or in a > comparison operator. Drill does not support compare between MAP:REPEATED and > MAP:REPEATED. > /* Or other error with another set of data : > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index 0. Error: Missing function implementation: > [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null.. > */ > {code} > These errors are incomprehensible because, the aggregate is on VARCHAR. > More, the request works if decomposed in 2 request with with the creation of > an intermediate table like below: > {code:sql} > CREATE TABLE `tmp.parquet` AS ( > SELECT id, flatten(tags) tag > FROM ( > SELECT id, any_value(tags) tags > FROM `example_parquet` > GROUP BY id > )); > SELECT tag, count(*) c FROM `tmp_parquet` GROUP BY tag; > +-+---+ > | tag | c | > +-+---+ > | overlay | 2 | > | peexe | 2 | > | signed | 2 | > | upx | 1 | > +-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)