If I can get some more examples of corrupted files I’ll test more thoroughly. Also, we’ll need to apply the same methodology to PCAP-NG, so I’ll need some examples there as well. My strategy is going to be get as much data as possible out of the corrupt packet. — C
> On Feb 10, 2019, at 10:54, Ted Dunning <[email protected]> wrote: > > I think that accessing fields in corrupted packets will also cause > exceptions. But this is a great start. Conditionalizing field access on > !is_corrupt() might be sufficient for the next step. > > > > On Sun, Feb 10, 2019 at 4:58 AM Charles Givre <[email protected]> wrote: > >> All, >> I posted the following PR for this issue: >> https://github.com/apache/drill/pull/1637 < >> https://github.com/apache/drill/pull/1637> >> >> Basically this PR does two things. >> 1. It creates a boolean column called is_corrupt and >> 2. If the PCAP file has a corrupt row, it marks that row as corrupt by >> setting is_corrupt to true and keeps going >> >> WIth the example from Giovanni, I was able to find 590 or so corrupt rows >> out of 7000 in that PCAP file. It was late and I don’t know if that was >> what ti was supposed to find, but it worked and was able to query that. >> If you guys could send a few more examples, I’d like to test this on other >> files to make sure it works with them. We’re also going to have to do the >> same thing for the PCAP-NG format I would assume. >> >>> On Feb 10, 2019, at 03:07, Ted Dunning <[email protected]> wrote: >>> >>> On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis <[email protected]> wrote: >>> >>>> ... >>>> And, I did indeed find a few and am just waiting for a formal review so >> I >>>> can submit them for the Drill dev & tests. >>>> >>> >>> Awesome! >> >>
