Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

Bryan Bende Thu, 21 Apr 2016 13:37:54 -0700

Hi Igor,

I don't know that much about Hive so I can't really say what format it
needs to be in for Hive to understand it.


If it needs to be a valid array of JSON documents, in MergeContent change
the Delimiter Strategy to "Text" which means it will use whatever values
you type directly into Header, Footer, Demarcator, and then specify [ ] ,
 respectively as the values.

That will get you something like this where {...} are the incoming
documents:

[
{...},
{...},
]

-Bryan


On Thu, Apr 21, 2016 at 4:06 PM, Igor Kravzov <[email protected]>
wrote:

> Hi Brian,
>
> I am aware of this example. But I want to store JSON as it is and create
> external table. Like in this example.
> http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/
> What I don't know is how to properly merge multiple JSON in one file in
> order for hive to read it properly.
>
> On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <[email protected]> wrote:
>
>> Hello,
>>
>> I believe this example shows an approach to do it (it includes Hive even
>> though the title is Solr/banana):
>>
>> https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html
>>
>> The short version is that it extracts several attributes from each tweet
>> using EvaluateJsonPath, then uses ReplaceText to replace the FlowFile
>> content with a pipe delimited string of those attributes, and then creates
>> a Hive table that knows how to handle that delimiter. With this approach
>> you don't need to set the header, footer, and demarcator in MergeContent.
>>
>> create table if not exists tweets_text_partition(
>> tweet_id bigint,
>> created_unixtime bigint,
>> created_time string,
>> displayname string,
>> msg string,
>> fulltext string
>> )
>> row format delimited fields terminated by "|"
>> location "/tmp/tweets_staging";
>>
>> -Bryan
>>
>>
>> On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <[email protected]>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I want to create a following workflow:
>>>
>>> 1.Fetch tweets using GetTwitter processor.
>>> 2.Merge tweets in a bigger file using MergeContent process.
>>> 3.Store merged files in HDFS.
>>> 4. On the hadoop/hive side I want to create an external table based on
>>> these tweets.
>>>
>>> There are examples how to do this tbut what I am missing is how to
>>> configure MergeContent processor: what to set as header,footer and
>>> demarcator. And what to use on on hive side as separator so thatit will
>>> split merged tweets in rows. Hope I described myself clearly.
>>>
>>> Thanks in advance.
>>>
>>
>>
>

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

Reply via email to