After Ted’s feedback I’ve reimplemented the json validator using jackson.
This is the same library Drill uses to read json, and should be helpful in
validating if your data on disk is valid json.

I wouldn’t call myself a java developer by any means so if there are things
I overlooked, or missed, or there are improvements you want to add to this,
by all means send a pull request.

Future enhancements might include:

   - Counting how many records are valid
   - Maybe an “hdfs” flag to access data stored in HDFS/MapR-FS

https://github.com/cjmatta/jsonvalidator
​

Chris Matta
[email protected]
215-701-3146

On Tue, Nov 25, 2014 at 7:31 AM, Christopher Matta <[email protected]> wrote:

> Ted, I'll take a look! Thanks.
>
> Chris Matta
> [email protected]
> 215-701-3146
>
> On Tue, Nov 25, 2014 at 6:27 AM, Ted Dunning <[email protected]>
> wrote:
>
>> Chris,
>>
>> Your tool could be updated to use Jackson and would then have the exact
>> same semantics as Drill.
>>
>> It is still great as it is.... just could be slightly greater.
>>
>>
>> On Mon, Nov 24, 2014 at 11:09 PM, Steven Phillips <[email protected]
>> >
>> wrote:
>>
>> > No, Drill uses jackson to parse the json as a stream. It's fine if the
>> json
>> > record has newline characters.
>> >
>> > Your validation tool is still useful, in the case where each json
>> record is
>> > contained in a single line, which is common. Just be aware that it won't
>> > work in all cases.
>> >
>> > On Mon, Nov 24, 2014 at 3:04 PM, Christopher Matta <[email protected]>
>> > wrote:
>> >
>> > > Steven,
>> > > Yes it does, doesn't Drill  also require that the entire JSON record
>> be
>> > on
>> > > a single line?
>> > >
>> > > I wrote this for situations when the data set is too large to paste
>> into
>> > a
>> > > web-based validator.
>> > >
>> > > Chris Matta
>> > > [email protected]
>> > > 215-701-3146
>> > >
>> > > On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <
>> [email protected]
>> > >
>> > > wrote:
>> > >
>> > > > Christopher,
>> > > >
>> > > > Does your validator require that the entire json record be on a
>> single
>> > > > line?
>> > > >
>> > > > On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <[email protected]>
>> > wrote:
>> > > >
>> > > > > BTW, there's a web based validator called jsonlint.com whose
>> source
>> > is
>> > > > > available at :  https://github.com/arc90/jsonlintdotcom
>> > > > >
>> > > > > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > I’ve been running across errors in Drill when a JSON record is
>> > > invalid.
>> > > > > To
>> > > > > > reduce the number of these errors, I wrote this small, simple
>> > > > application
>> > > > > > that will open a specified file, check if each line is a valid
>> JSON
>> > > > > record,
>> > > > > > and error if it’s not:
>> > > > > >
>> > > > > > https://github.com/cjmatta/jsonr
>> > > > > >
>> > > > > > Usage:
>> > > > > >
>> > > > > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
>> > > > > > CWARNING:root:JSON load error on line 16640 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > WARNING:root:JSON load error on line 16641 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > WARNING:root:JSON load error on line 16642 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > Checking line 17000
>> > > > > > Done.
>> > > > > >
>> > > > > > Please check it out, use it, contribute back if there’s
>> something
>> > > > broken
>> > > > > or
>> > > > > > missing.
>> > > > > >
>> > > > > > Chris Matta
>> > > > > > [email protected]
>> > > > > > 215-701-3146
>> > > > > > ​
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >  Steven Phillips
>> > > >  Software Engineer
>> > > >
>> > > >  mapr.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >  Steven Phillips
>> >  Software Engineer
>> >
>> >  mapr.com
>> >
>>
>
>

Reply via email to