Hey John, If you want to, you can download the binaries for 1.3 release candidate from [1] and see if you can reproduce the error. You just need to unzip the folder and run "bin/drill-embedded".
Without some data to reproduce the issue, it's really hard to come up with an explanation. Thanks [1] http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/ On Thu, Nov 5, 2015 at 5:12 AM, John Omernik <[email protected]> wrote: > Hey Steven, I will look into that. Based on your understanding of the > problem would DRILL-4006 still apply given these conditions > > 1. When I query a directory of json files, and it fails signaling a > specific JSON file as a culprit. When I remove that file, it works, and > when I do a query only on that culprit JSON file it works as well. > 2. When the error occurs, if I restart my drill bits, and run the query > again it seems to work (This one baffles me) > > I will look to try the 1.3 release, I am using 1.2.1 release from MapR, so > I may have to wait until they roll a package for easy install (I want to > include their MapRDB Support). > > MapR Team: If you have a current release with the Drill 4006 incorporated > and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love to > give it a shot (non-supported of course) > > > > On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <[email protected]> > wrote: > > > This looks like DRILL-4006, a fix for which just went in. > > > > https://issues.apache.org/jira/browse/DRILL-4006 > > > > > > On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <[email protected]> wrote: > > > > > I am on MapR's 1.2.1 Package. > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche < > > [email protected] > > > > > > > wrote: > > > > > > > One last thing, what version of Drill do you have installed ? > > > > > > > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <[email protected]> > > wrote: > > > > > > > > > No I don't think so. I am running Drill in Marathon on Mesos, so > my > > > > > startup settings are all very static. In addition, the only session > > > > > variable I was changed was the json as text option at the session > > level > > > > and > > > > > I was setting it on both the pre drillbit reboot and the post > > drillbit > > > > > reboot sessions (I need that to query the data). > > > > > > > > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche < > > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > This is strange indeed. The error message you reported earlier > > > doesn't > > > > > > suggest a memory leak issue but rather a bug when reading a > > specific > > > > set > > > > > of > > > > > > data. > > > > > > Could it be that you changed some session options, and you forgot > > to > > > > set > > > > > > them again after you restarted the drillbits ? > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <[email protected]> > > > > wrote: > > > > > > > > > > > > > So I pulled the (I was up to two) files that seemed to be > causing > > > > this > > > > > > > issue out, and loaded my data. (see my other posts on how I > did > > > that > > > > > > with > > > > > > > loading into a folder prefixed by .) > > > > > > > > > > > > > > Anywho, my Drill cluster became unstable in general, and I was > > not > > > > able > > > > > > to > > > > > > > run any queries until I bounced by drill bits. > > > > > > > > > > > > > > I did that, got my process working again, and went to go try > > > > > > > troubleshooting this problem again and everything appears to be > > > > working > > > > > > > well now. I am stumped. Could a memory leak have caused that > > > error > > > > > > only > > > > > > > on some files? I am monitoring now to determine if the problem > > > > starts > > > > > > > again, but that is REALLY strange to me. This seems out of > > > character > > > > > for > > > > > > > Drill, both in my use of it, and in how it handles memory has > > been > > > > > > > explained to me. If I get the error again, I'll ensure I set > > that > > > to > > > > > > get a > > > > > > > full stack trace. > > > > > > > > > > > > > > John > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche < > > > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > The error message "index: 9604, length: 4 (expected: range(0, > > > > 8192))" > > > > > > > > suggests an error happened when Drill tried to access a > memory > > > > buffer > > > > > > > (most > > > > > > > > likely while writing an int or float value) > > > > > > > > This may be a bug actually exposed by that particular data > > > record. > > > > > > > > > > > > > > > > You can try enabling verbose error logging before running the > > > query > > > > > > > again: > > > > > > > > > > > > > > > > set `exec.errors.verbose`=true; > > > > > > > > > > > > > > > > This should give us a nice stack trace about this error. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > > > There are multiple fields in that record, including two > > lists. > > > > Both > > > > > > > lists > > > > > > > > > have data in them (now I am runnning with json text mode > > > because > > > > at > > > > > > > times > > > > > > > > > the first value is a JSON null, but in these cases, that > > should > > > > be > > > > > > > turned > > > > > > > > > to "null" as string. (If I am understanding things > > correctly) > > > > and > > > > > > > > > shouldn't be causing a problem. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > What is the data type for that record in line 2402? A > list? > > > > > > > > > > > > > > > > > > > > Do you think it could be similar to this issue ? > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > > > > > > > > > I am working with JSON that is on the whole fairly > clean. > > > I > > > > am > > > > > > > > trying > > > > > > > > > to > > > > > > > > > > > load into Parquet files, and the previous days worth of > > > data > > > > > > worked > > > > > > > > > just > > > > > > > > > > > fine, but todays data has something wrong with it and I > > > Can't > > > > > > > figure > > > > > > > > > out > > > > > > > > > > > what it is. Unfortunately, I can't post the data, > which I > > > > know > > > > > > > makes > > > > > > > > > this > > > > > > > > > > > hard to troubleshoot for the community. Hopefully I can > > > > provide > > > > > > > some > > > > > > > > > info > > > > > > > > > > > here, and get some pointers on where to look, and then > > > report > > > > > > back > > > > > > > on > > > > > > > > > how > > > > > > > > > > > we could potentially improve the error messages. > > > > > > > > > > > > > > > > > > > > > > The error is below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am looking to figure out given the information > reported > > > > where > > > > > > I'd > > > > > > > > > look > > > > > > > > > > to > > > > > > > > > > > trouble shoot this. Obviously the file > > > > > > > > > > 02ffc306e877_my_load_1446640931.json > > > > > > > > > > > is where I am looking to start > > > > > > > > > > > > > > > > > > > > > > This file has 3000 lines (records of data, so it's > > > somewhere > > > > in > > > > > > > > > between. > > > > > > > > > > > > > > > > > > > > > > The index/length/expected range don't mean anything to > > me I > > > > > could > > > > > > > use > > > > > > > > > > some > > > > > > > > > > > help there, because I am not even sure what I am > looking > > > for. > > > > > > > > > > > > > > > > > > > > > > The record and/or Fragment... do those help me dig in? > > > > > > > > > > > > > > > > > > > > > > Since this is one record per line, I went to line 2402 > > but > > > > that > > > > > > > > record > > > > > > > > > > > looks completely normal to me, (like all the other > ones) > > > but > > > > > > since > > > > > > > > this > > > > > > > > > > is > > > > > > > > > > > dense text, I am obviously missing something, but is > the > > > > record > > > > > > the > > > > > > > > > line > > > > > > > > > > > number? > > > > > > > > > > > > > > > > > > > > > > Any other pointers I can use to trouble shoot this? > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > Error: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Caused by: > > > > > > org.apache.drill.common.exceptions.UserRemoteException: > > > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, > > length: > > > 4 > > > > > > > > (expected: > > > > > > > > > > > range(0, 8192)) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > File > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json > > > > > > > > > > > > > > > > > > > > > > Record 2402 > > > > > > > > > > > > > > > > > > > > > > Fragment 1:5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > < > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
