If I can get some more examples of corrupted files I’ll test more thoroughly.  
Also, we’ll need to apply the same methodology to PCAP-NG, so I’ll need some 
examples there as well.  My strategy is going to be get as much data as 
possible out of the corrupt packet. 
— C



> On Feb 10, 2019, at 10:54, Ted Dunning <[email protected]> wrote:
> 
> I think that accessing fields in corrupted packets will also cause
> exceptions. But this is a great start. Conditionalizing field access on
> !is_corrupt() might be sufficient for the next step.
> 
> 
> 
> On Sun, Feb 10, 2019 at 4:58 AM Charles Givre <[email protected]> wrote:
> 
>> All,
>> I posted the following PR for this issue:
>> https://github.com/apache/drill/pull/1637 <
>> https://github.com/apache/drill/pull/1637>
>> 
>> Basically this PR does two things.
>> 1.  It creates a boolean column called is_corrupt and
>> 2.  If the PCAP file has a corrupt row, it marks that row as corrupt by
>> setting is_corrupt to true and keeps going
>> 
>> WIth the example from Giovanni, I was able to find 590 or so corrupt rows
>> out of 7000 in that PCAP file.  It was late and I don’t know if that was
>> what ti was supposed to find, but it worked and was able to query that.
>> If you guys could send a few more examples, I’d like to test this on other
>> files to make sure it works with them.  We’re also going to have to do the
>> same thing for the PCAP-NG format I would assume.
>> 
>>> On Feb 10, 2019, at 03:07, Ted Dunning <[email protected]> wrote:
>>> 
>>> On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis <[email protected]> wrote:
>>> 
>>>> ...
>>>> And, I did indeed find a few and am just waiting for a formal review so
>> I
>>>> can submit them for the Drill dev & tests.
>>>> 
>>> 
>>> Awesome!
>> 
>> 

Reply via email to