I think that accessing fields in corrupted packets will also cause exceptions. But this is a great start. Conditionalizing field access on !is_corrupt() might be sufficient for the next step.
On Sun, Feb 10, 2019 at 4:58 AM Charles Givre <[email protected]> wrote: > All, > I posted the following PR for this issue: > https://github.com/apache/drill/pull/1637 < > https://github.com/apache/drill/pull/1637> > > Basically this PR does two things. > 1. It creates a boolean column called is_corrupt and > 2. If the PCAP file has a corrupt row, it marks that row as corrupt by > setting is_corrupt to true and keeps going > > WIth the example from Giovanni, I was able to find 590 or so corrupt rows > out of 7000 in that PCAP file. It was late and I don’t know if that was > what ti was supposed to find, but it worked and was able to query that. > If you guys could send a few more examples, I’d like to test this on other > files to make sure it works with them. We’re also going to have to do the > same thing for the PCAP-NG format I would assume. > > > On Feb 10, 2019, at 03:07, Ted Dunning <[email protected]> wrote: > > > > On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis <[email protected]> wrote: > > > >> ... > >> And, I did indeed find a few and am just waiting for a formal review so > I > >> can submit them for the Drill dev & tests. > >> > > > > Awesome! > >
