[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2017-11-11 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Further testing revealed limitations in the underlying Jackson parser. That parser will not recover from other errors such as: ``` { a: } ``` See

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-14 Thread parthchandra
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/518 +1. Looks like there has been enough review and there is good enough reason to merge this in --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-08 Thread kfaraaz
Github user kfaraaz commented on the issue: https://github.com/apache/drill/pull/518 The below JSON is invalid, due to presence of duplicate key 'key'. Today Drill returns a DATA_READ error, does your proposed fix handle this case too ? [root@centos-01 ~]# cat f1.json

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 I have a similar data set checked in. Hope that is good enough. On Fri, Oct 7, 2016 at 10:00 AM, Paul Rogers wrote: > Base data set: >

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Base data set: { "p": 1, "a": { "x": 10, "y": 20, "z": 30 }, "b": 50, "c": 60 } { "p": 2, "a": { "x": 11, "y": 21, "z": 31 }, "b": 51, "c": 61 } { "p": 3, "a": { "x":

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-07 Thread zfong
Github user zfong commented on the issue: https://github.com/apache/drill/pull/518 @paul-rogers - since you had concerns about particular test cases, and it looks like you've confirmed that those are non-issues, would it make sense for you to share those with @ssriniva123 and he can

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Ran some tests. The results look good. In particular, files with nested structures produced the correct results. Since it was the nested structure case that had me a bit worried, looks like the

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-27 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Paul, The code you have listed is semantically equivalent to that of what I already I have submitted for pull and will not solve handling of all malformed json records. Also the code for

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-27 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 The open question was how we can discard a partly-built record during recovery. As far as I can tell (veterans, please correct me), the JSONRecordReader keeps track of the record count. So, all

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-26 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 As it turns out, the sample code shown was actually tested with a stock Jackson JSON parser: it does work. No parser changes are needed. The issue is not whether we can make the parser do

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-26 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 There is not much to do except change the JSON parser to support this functionality. - Indicate to the parser that a current record terminates when it encounters a \n (Of course this

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-21 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Looks like you are right; the JsonParser is more than a simple tokenizer. We're not the first to try this:

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-19 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Apologize for getting back on this thread late, got tied up with some issues@work. Paul, The json parser is not just a tokenizer, it keeps track of the JSON structure and understands

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-13 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Poking around in the code a bit more, it looks like we rely on the Jackson JsonParser which delivers a stream of tokens. Here's the code in JsonReader: public ReadState

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-13 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Paul, Thanks for taking the time out in writing out a detailed email. Here are some of my thoughts. - Drill uses the com.fasterxml.jackson.core.json.UTF8StreamJsonParser for parsing

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Upon reflection, it seems that newline is not an adequate marker to separate JSON records. Many of our samples have internal newlines. If a newline appears inside the JSON record, then we are

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Thanks much for your contribution! Sorry it is taking a while to review the code. I've read over the code changes a couple of times. The options part is fine. However, the parser part

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-12 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Folks, I have tests on my local drill setup with various combinations of invalid json format. The only caveat is that some records may be skipped if JSON is not properly delimitted by }. Can

Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-23 Thread Subbu Srinivasan
Thanks Jason. I am in the process of doing more unit testing, corner case testing etc before submitting for final review. Thanks Subbu On Mon, Aug 22, 2016 at 9:51 AM, jaltekruse wrote: > Github user jaltekruse commented on the issue: > >

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-22 Thread jaltekruse
Github user jaltekruse commented on the issue: https://github.com/apache/drill/pull/518 After this was discussed at the Hangout a few weeks back, I had been thinking about it more. My initial request was for warnings to be returned along with query results. Some initial work

Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-08 Thread Jason Altekruse
Hey Parth and Subbu, Sorry for missing the last message on this thread, I will be able to attend the hangout tomorrow to discuss my concern. As I had said previously, I am mostly trying to make sure that there is agreement about the impact of this change on user behavior and expectations before

Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-04 Thread Parth Chandra
Hi Subbu, Yes we can discuss this on the next hangout. If Jason is able to attend we can discuss some way to address his concern. Parth On Wed, Aug 3, 2016 at 10:24 AM, Subbu Srinivasan wrote: > Hi Folks, > When can we discuss this feature? Would next hangout be

Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-03 Thread Subbu Srinivasan
Hi Folks, When can we discuss this feature? Would next hangout be appropriate? Thanks Subbu On Mon, Jul 25, 2016 at 10:20 AM, Subbu Srinivasan wrote: > This mechanism falls in line with other JSON processing similar to serde's > with Hive, UDF's enabled at global level

Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-25 Thread Subbu Srinivasan
This mechanism falls in line with other JSON processing similar to serde's with Hive, UDF's enabled at global level will apply to all users and is outlined using documentation. What is your stance if we move to the JSONFormatPlugin? On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-15 Thread jaltekruse
Github user jaltekruse commented on the issue: https://github.com/apache/drill/pull/518 I don't think we should merge this without a mechanism to return a warning to the user to tell them at least that some data was ignored, and ideally some indication of how much data was discarded.

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-14 Thread chunhui-shi
Github user chunhui-shi commented on the issue: https://github.com/apache/drill/pull/518 Thanks for providing a way to skip bad json records and addressing my comments. Besides the minor improvements I suggested above: comment format changes that may be due to line width in your

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-07-11 Thread jinfengni
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/518 @chunhui-shi , I saw you made comments days ago. Can you pls take a look at the new patch to see if it addressed your comment? thx. --- If your project is set up for it, you can reply to

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-24 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 I have made several modifications to get accurate line nos: - The main change is to move the json parser to the end of the current record being processed since previously multiple

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-22 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 What is the recommendation with this bug? Should we just remove the line nos for now and just name the file name. I was not able to work on this last couple of days, not sure if there is

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-17 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/518 hmm.. I still see 4 commits in the pull request. Can you squash them into one ? (let me know if you need help with that). Also, the commit message needs to be in the format: "DRILL-4653:

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-16 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/518 Looks much better. Sorry for the nitpick but I still have a couple more related to the coding conventions. :)Also, could you squash the commits into 1 and use the DRILL-: format for

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-16 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 I have made changes are recommended by reviewers: - Changed JSON_READER_SKIP_INVALID_RECORDS_FLAG constant - Modified unit test to use builder framework - Code indendation changes.

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 I updated the pull request with more changes requested by the reviewers. On Wed, Jun 15, 2016 at 5:28 PM, Aman Sinha wrote: > Yes, it does in fact

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 For some reason the build terminated- What needs to be done to fire it again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 I have added support so that it prints the offending file/line nos. Error parsing JSON in DRILL-4653.json : line nos :13 Error parsing JSON in DRILL-4653.json : line nos :14 --- If

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Yes, I looked at DRILL-3764. The changes made to JSONProcessing do not conflict with my changes. The only conflict I see is these two constants, I have defined my own. String

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/518 Yes, it does in fact have a conflict with DRILL-3764 which has changes to the JsonRecordReader, although this issue is still in progress. I noticed that @adeneche mentioned this in the JIRA.

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread parthchandra
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/518 Does this conflict with Drill 3764 (see https://github.com/apache/drill/pull/497)?. I think there was a broader discussion around this as part of enabling the functionality of skipping/logging

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread ssriniva123
Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Do u anticipate this to be logged using log4j? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-06-15 Thread amansinha100
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/518 A high level comment I have about this PR is that keeping the number of 'malformed' records in the json file may not be sufficient..I would expect the user would want to know the file name and