I hacked around this using a spark. Basically I had spark read the files,
use the json module in python, read in, adjusted the dictionaries to be an
array (and added the name) and then had spark output to parquet. Made a
drill view to clean up some things. (when I created the array in a
"myvalues" key, instead of myvalues:[{dic1}, {dict2}] etc
it did
myvalues:{bag:[{dic1},{dic2}]}
which was weird, spark thing maybe? I don't have the string "bag" anywhere
in my code. So I am not sure how it got there. But it's working!
I made a view on that data to clean it up (removed the bag etc) and it
works well!
John
On Tue, Sep 22, 2015 at 7:35 AM, John Omernik <[email protected]> wrote:
> Tomer -
>
> I wrote a hacky Python script to get it all out and came up with 263
> values..
>
> Chris - This is an interesting problem. If there were some semantics in
> the json plugin to handle this, it would be helpful. The challenge as I was
> thinking through it is mongo isn't consistent in how it outputs the data,
> in the three I included in the thread its _id, created_on, and the dynamic
> myvalues, however, looking through the data, there are other records where
> created_on is the last one, so even parsing it as an array, and using the
> array value to reference it doesn't help. This is a challenge!
>
>
>
> On Mon, Sep 21, 2015 at 7:52 PM, Tomer Shiran <[email protected]> wrote:
>
>> How many different myvalueX are there in your dataset? (In the example
>> below you have 3.) is it a small, known set, or could it be anything?
>>
>>
>>
>> > On Sep 21, 2015, at 12:53 PM, John Omernik <[email protected]> wrote:
>> >
>> > Sorry about that, premature send. Here are some records. As you can
>> see
>> > the myvalue1-3 is in the top level of the record, ideally I'd run thte
>> > kvgen on the myvalue records, but I have no way to address those. I
>> tried
>> > kvgen() on * for and that failed. Not sure how to address this in json,
>> > yes, I know it's poorly formatted, but it's what I have been given.
>> >
>> >
>> >
>> >
>> >
>> > { "_id" : "127.0.0.1", "created_on" : "2014-02-18 14:52:23", "myvalue1"
>> : {
>> > "source" : "somestuff", "context" : "Context here", "last_seen" :
>> > "2014-02-11 00:00:00", "refreshed" : "2014-03-12 18:14:23" } }
>> > { "_id" : "127.0.0.2", "created_on" : "2014-02-18 14:52:08", "myvalue2"
>> : {
>> > "source" : "otherstuff", "context" : "Special context", "last_seen" :
>> > "2014-02-26 18:14:05", "refreshed" : "2014-02-26 18:14:05" } }
>> > { "_id" : "127.0.0.3", "created_on" : "2014-04-25 00:08:17", "myvalue3"
>> : {
>> > "source" : "oops", "context" : "Other Context, "last_seen" : "2014-04-25
>> > 05:32:08", "refreshed" : "2014-04-25 05:32:08" } }
>> >
>> >> On Mon, Sep 21, 2015 at 2:52 PM, John Omernik <[email protected]>
>> wrote:
>> >>
>> >> The challenge I have is the data was poorly formatted, here's some
>> records
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>> On Mon, Sep 21, 2015 at 2:14 PM, Jim Scott <[email protected]>
>> wrote:
>> >>>
>> >>> I do believe KVGEN will meet your needs:
>> >>> https://drill.apache.org/docs/kvgen/
>> >>>
>> >>>> On Mon, Sep 21, 2015 at 2:11 PM, John Omernik <[email protected]>
>> wrote:
>> >>>>
>> >>>> I have some poorly developed json where the developer used data for
>> key
>> >>>> names
>> >>>>
>> >>>> {"created":"2015-12-01", "ZYS":"BLAH"}
>> >>>> {"created":"2015-12-01", "ZYX":"BLAH"}
>> >>>> {"created":"2015-12-01", "ABC":"BLAH"}
>> >>>> {"created":"2015-12-01", "ADS":"BLAH"}
>> >>>>
>> >>>> I'd like to somehow map the key name to a value and give it a generic
>> >>> name
>> >>>>
>> >>>> select `created`, somemagic() as value1 from table
>> >>>>
>> >>>> Not sure how this would work, or if it's possible, or how I'd even
>> >>>> reference that, but thought I would ask.
>> >>>>
>> >>>> John
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> *Jim Scott*
>> >>> Director, Enterprise Strategy & Architecture
>> >>> +1 (347) 746-9281
>> >>> @kingmesal <https://twitter.com/kingmesal>
>> >>>
>> >>> <http://www.mapr.com/>
>> >>> [image: MapR Technologies] <http://www.mapr.com>
>> >>>
>> >>> Now Available - Free Hadoop On-Demand Training
>> >>> <
>> >>>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >>
>> >>
>>
>
>