Run the output through UpdateAttribute and put a property on that
processor with a name of 'filename' and a value of
'${filename}.yourextension'Thanks Joe On Thu, Apr 21, 2016 at 5:24 PM, Igor Kravzov <[email protected]> wrote: > Thanks guys. I think it will work. > One thing: merged file comes out without extension. How do I add extension > to a merged file? > > On Thu, Apr 21, 2016 at 4:42 PM, Simon Ball <[email protected]> wrote: >> >> For most hive JSON serdes you are going to want what some people call JSON >> record format. This is essentially a text file with a JSON document per line >> which represents a record, with reasonably consistent structure. You can >> achieve this by ensuring your JSON is not pretty formatted (one doc per >> line) and then just using binary concatenation in the MergeContent processor >> Bryne mentioned. >> >> Simon >> >> >> On 21 Apr 2016, at 22:38, Bryan Bende <[email protected]> wrote: >> >> Also, this blog has a picture of what I described with MergeContent: >> >> https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and >> >> -Bryan >> >> On Thu, Apr 21, 2016 at 4:37 PM, Bryan Bende <[email protected]> wrote: >>> >>> Hi Igor, >>> >>> I don't know that much about Hive so I can't really say what format it >>> needs to be in for Hive to understand it. >>> >>> If it needs to be a valid array of JSON documents, in MergeContent change >>> the Delimiter Strategy to "Text" which means it will use whatever values you >>> type directly into Header, Footer, Demarcator, and then specify [ ] , >>> respectively as the values. >>> >>> That will get you something like this where {...} are the incoming >>> documents: >>> >>> [ >>> {...}, >>> {...}, >>> ] >>> >>> -Bryan >>> >>> >>> On Thu, Apr 21, 2016 at 4:06 PM, Igor Kravzov <[email protected]> >>> wrote: >>>> >>>> Hi Brian, >>>> >>>> I am aware of this example. But I want to store JSON as it is and create >>>> external table. Like in this example. >>>> http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/ >>>> What I don't know is how to properly merge multiple JSON in one file in >>>> order for hive to read it properly. >>>> >>>> On Thu, Apr 21, 2016 at 2:33 PM, Bryan Bende <[email protected]> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I believe this example shows an approach to do it (it includes Hive >>>>> even though the title is Solr/banana): >>>>> >>>>> https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html >>>>> >>>>> The short version is that it extracts several attributes from each >>>>> tweet using EvaluateJsonPath, then uses ReplaceText to replace the >>>>> FlowFile >>>>> content with a pipe delimited string of those attributes, and then >>>>> creates a >>>>> Hive table that knows how to handle that delimiter. With this approach you >>>>> don't need to set the header, footer, and demarcator in MergeContent. >>>>> >>>>> create table if not exists tweets_text_partition( >>>>> tweet_id bigint, >>>>> created_unixtime bigint, >>>>> created_time string, >>>>> displayname string, >>>>> msg string, >>>>> fulltext string >>>>> ) >>>>> row format delimited fields terminated by "|" >>>>> location "/tmp/tweets_staging"; >>>>> >>>>> -Bryan >>>>> >>>>> >>>>> On Thu, Apr 21, 2016 at 1:52 PM, Igor Kravzov <[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi guys, >>>>>> >>>>>> I want to create a following workflow: >>>>>> >>>>>> 1.Fetch tweets using GetTwitter processor. >>>>>> 2.Merge tweets in a bigger file using MergeContent process. >>>>>> 3.Store merged files in HDFS. >>>>>> 4. On the hadoop/hive side I want to create an external table based on >>>>>> these tweets. >>>>>> >>>>>> There are examples how to do this tbut what I am missing is how to >>>>>> configure MergeContent processor: what to set as header,footer and >>>>>> demarcator. And what to use on on hive side as separator so thatit will >>>>>> split merged tweets in rows. Hope I described myself clearly. >>>>>> >>>>>> Thanks in advance. >>>>> >>>>> >>>> >>> >> >
