I am reading a MongoDB dump file in Drill.   On the surface it seems to be
working well, however, I have some need to trouble shoot, and I was curious
the best way to approach. Here are some "things"


1. It's a large file 1.2 GB compressed. It's named mondodump.json.gz and
drill seems to be (on the surface) handling that correctly
2. It's Drill 1.1. (MapR Package)
3.  select * from `/pathoto/*` limit 10 seems to work, in this case the _id
field is ip addresses (long story)
4. In the select * limit 10, if I do select * from `/pathto/*` where `_id`
= '123.123.123.123' (which was returned in the select * limit 10 query from
#3, it finds the record, all is well.
5. If I take select * from `/pathto/*` where `_id` = '127.0.0.1' which I
know to be in the data (validated with zgrep) it does NOT find the data.
Based on the results from zGrep, it should find it, I am not sure if there
something weird in reading the data, but its not throwing errors.
6. select count(*) from `/pathro/*` returns the same number as zcat
source.json.gz|wc -l This is interesting because it apparently means things
are lined up, but why isn't that IP showing?

So my question is this: Is there anything in Drill that would cause it to
miss that? Weird chars? etc I know it's hard, but with a 1.2 GB compressed
file, how would one trouble shoot this?

Reply via email to