Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

Joe Witt Thu, 21 Apr 2016 14:27:07 -0700

Run the output through UpdateAttribute and put a property on that
processor with a name of 'filename' and a value of
'${filename}.yourextension'


Thanks
Joe

On Thu, Apr 21, 2016 at 5:24 PM, Igor Kravzov <[email protected]> wrote:
> Thanks guys. I think it will work.
> One thing: merged file comes out without extension. How do I add extension
> to a merged file?
>
> On Thu, Apr 21, 2016 at 4:42 PM, Simon Ball <[email protected]> wrote:
>>
>> For most hive JSON serdes you are going to want what some people call JSON
>> record format. This is essentially a text file with a JSON document per line
>> which represents a record, with reasonably consistent structure. You can
>> achieve this by ensuring your JSON is not pretty formatted (one doc per
>> line) and then just using binary concatenation in the MergeContent processor
>> Bryne mentioned.
>>
>> Simon
>>
>>
>> On 21 Apr 2016, at 22:38, Bryan Bende <[email protected]> wrote:
>>
>> Also, this blog has a picture of what I described with MergeContent:
>>
>> https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and
>>
>> -Bryan
>>
>> On Thu, Apr 21, 2016 at 4:37 PM, Bryan Bende <[email protected]> wrote:
>>>
>>> Hi Igor,
>>>
>>> I don't know that much about Hive so I can't really say what format it
>>> needs to be in for Hive to understand it.
>>>
>>> If it needs to be a valid array of JSON documents, in MergeContent change
>>> the Delimiter Strategy to "Text" which means it will use whatever values you
>>> type directly into Header, Footer, Demarcator, and then specify [ ] ,
>>> respectively as the values.
>>>
>>> That will get you something like this where {...} are the incoming
>>> documents:
>>>
>>> [
>>> {...},
>>> {...},
>>> ]
>>>
>>> -Bryan
>>>
>>>
>>> On Thu, Apr 21, 2016 at 4:06 PM, Igor Kravzov <[email protected]>
>>> wrote:
>>>>
>>>> Hi Brian,
>>>>
>>>> I am aware of this example. But I want to store JSON as it is and create
>>>> external table. Like in this example.
>>>> http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/
>>>> What I don't know is how to properly merge multiple JSON in one file in
>>>> order for hive to read it properly.
>>>>
>>>> On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <[email protected]> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I believe this example shows an approach to do it (it includes Hive
>>>>> even though the title is Solr/banana):
>>>>>
>>>>> https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html
>>>>>
>>>>> The short version is that it extracts several attributes from each
>>>>> tweet using EvaluateJsonPath, then uses ReplaceText to replace the 
>>>>> FlowFile
>>>>> content with a pipe delimited string of those attributes, and then 
>>>>> creates a
>>>>> Hive table that knows how to handle that delimiter. With this approach you
>>>>> don't need to set the header, footer, and demarcator in MergeContent.
>>>>>
>>>>> create table if not exists tweets_text_partition(
>>>>> tweet_id bigint,
>>>>> created_unixtime bigint,
>>>>> created_time string,
>>>>> displayname string,
>>>>> msg string,
>>>>> fulltext string
>>>>> )
>>>>> row format delimited fields terminated by "|"
>>>>> location "/tmp/tweets_staging";
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>> On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I want to create a following workflow:
>>>>>>
>>>>>> 1.Fetch tweets using GetTwitter processor.
>>>>>> 2.Merge tweets in a bigger file using MergeContent process.
>>>>>> 3.Store merged files in HDFS.
>>>>>> 4. On the hadoop/hive side I want to create an external table based on
>>>>>> these tweets.
>>>>>>
>>>>>> There are examples how to do this tbut what I am missing is how to
>>>>>> configure MergeContent processor: what to set as header,footer and
>>>>>> demarcator. And what to use on on hive side as separator so thatit will
>>>>>> split merged tweets in rows. Hope I described myself clearly.
>>>>>>
>>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

Reply via email to