Pat,

GetTwitter does produce single tweet outputs and the example you
provided I believe is actually a single tweet.  Try pasting that into
a json formatter and I think you'll find it is a single tweet.
'created_at' is used on several entity types in twitter is appears so
that isn't telling you about the start of a single tweet.

Now, the size of a single tweet can certainly vary as it will depend
on mentions and users found within a tweet which it will then bundle
information about.  Looking at a feed of tweets over the past 4
minutes alone I see variation from 154 bytes to 13.5 KB. Also some
items coming out of that will include things like retweets and so on.
You can see more about this general pattern here [1].

Now, it sounds like you don't want the full bundle of data being sent
to Kafka.  You'll want to produce some projection of the tweet then.
You can do this by extracting attributes and using ReplaceText to
write out your own structure (could even be JSON).  Or you can use one
of the script executors to create your own transformation.

Lots of ways to tackle this.  Any one of these approaches sound like
they'd do the trick for your case?

[1] https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates

Thanks
Joe

On Sun, Jun 12, 2016 at 10:37 PM, Pat Trainor <pat.trai...@gmail.com> wrote:
> Most every single example I see rolls the OAuth from scratch... But the
> GetTwitter processor spits out JSON, and sometimes multiple tweets at a time
> within the {}... I am new to JSON, but it doesn't look that hard...
>
> I've tried a few libraries, including twitter4j, which I like a lot, but
> there is nothing in an example of how to split up a stream file of more than
> one... Lots of time in java scratch projects... I thought I'd ask the
> community what is most commonly used as an approach...
>
> Perhaps a Nifi splitter of some sort, then a JSON parser?
>
> I'm just curious about how you are getting down to each tweet's properties
> using GetTwitter instead of rolling your own... Even knowing which
> libraries/tools to dissect would be a big help.
>
> I'm using GetTwitter to create real world content for the NLP parsers I'm
> writing. Right now, the approach is:
>
> GetTwitter -->  [magic that splits tweet 'bundles' from stream] --> PutKafka
>
> Right now, there is a template like:
>
> GetTwitter --> EvaluateJsonPath --> RouteOnAttribute --> PutKafka
>
> GetTwitter: Grab Garden Hose
> EvaluateJsonPath: Pull Key Attributes
> RouteOnAttribute: Find only Tweets
> ...from a template I downloaded from Nifirocks, I think...
>
> ...But what is happenning is that PutKafka is being given JSON clumps of 1,
> 2, 3, + Tweets... I really need those tweets broken out...
>
> This just came in, and has 4 separate tweets (seach for "created_at")...
>
> {"created_at":"Sun Jun 12 18:15:58 +0000
> 2016","id":742057874155675648,"id_str":"742057874155675648","text":"RT
> @NC_N8: Burr received $7,900 from NRA in '10 election to fight against
> sensible gun legislation https:\/\/t.co\/m79Enjzzqx
> https:\/\/t.co\/mra\u2026","source":"\u003ca
> href=\"http:\/\/twitter.com\/#!\/download\/ipad\"
> rel=\"nofollow\"\u003eTwitter for
> iPad\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":14829396,"id_str":"14829396","name":"Weezette","screen_name":"Weezette","location":"SoCal","url":"http:\/\/crowandraven.blogspot.com\/","description":"California
> native. Birder. I love stockdogs, cur-dogs, western tanagers, CA back roads,
> log cabins, and breakfast in the
> garden.","protected":false,"verified":false,"followers_count":235,"friends_count":553,"listed_count":34,"favourites_count":18302,"statuses_count":6604,"created_at":"Mon
> May 19 06:28:58 +0000 2008","utc_offset":-25200,"time_zone":"Pacific Time
> (US &
> Canada)","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"391F05","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/727078200\/46629af0a73140f4977aaf3860d1da68.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/727078200\/46629af0a73140f4977aaf3860d1da68.jpeg","profile_background_tile":false,"profile_link_color":"FDAC1E","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"595140","profile_text_color":"000000","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/2843183703\/1ee5efd552b5a387fb15f4ca84496be3_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/2843183703\/1ee5efd552b5a387fb15f4ca84496be3_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/14829396\/1353308277","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Sun
> Jun 12 17:41:25 +0000
> 2016","id":742049180630274048,"id_str":"742049180630274048","text":"Burr
> received $7,900 from NRA in '10 election to fight against sensible gun
> legislation https:\/\/t.co\/m79Enjzzqx
> https:\/\/t.co\/mraolX1MZv","source":"\u003ca href=\"http:\/\/twitter.com\"
> rel=\"nofollow\"\u003eTwitter Web
> Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":19185921,"id_str":"19185921","name":"(((Nate)))","screen_name":"NC_N8","location":"Greensboro,
> NC","url":"http:\/\/blog.aba.org","description":"Birds, beers, Cardinals
> baseball. Into birding before it was cool. Editor at the @ABA Blog. Author
> of bird
> books.","protected":false,"verified":false,"followers_count":1936,"friends_count":379,"listed_count":119,"favourites_count":173,"statuses_count":2560,"created_at":"Mon
> Jan 19 15:05:09 +0000 2009","utc_offset":-14400,"time_zone":"Eastern Time
> (US &
> Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"ACDED6","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme18\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme18\/bg.gif","profile_background_tile":false,"profile_link_color":"038543","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"F6F6F6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/684775132318052354\/uVc-zqun_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/684775132318052354\/uVc-zqun_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/19185921\/1464805878","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"quoted_status_id":742008835993444352,"quoted_status_id_str":"742008835993444352","quoted_status":{"created_at":"Sun
> Jun 12 15:01:07 +0000
> 2016","id":742008835993444352,"id_str":"742008835993444352","text":"My
> thoughts and prayers are with the victims of this morning's horrific attack
> in Orlando and their loved ones.","source":"\u003ca
> href=\"http:\/\/twitter.com\/download\/iphone\"
> rel=\"nofollow\"\u003eTwitter for
> iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":21157904,"id_str":"21157904","name":"Richard
> Burr","screen_name":"SenatorBurr","location":"Winston-Salem,
> NC","url":"http:\/\/burr.senate.gov","description":"U.S. Senator from North
> Carolina, Chairman of the Senate Select Committee on
> Intelligence","protected":false,"verified":true,"followers_count":37701,"friends_count":1988,"listed_count":2175,"favourites_count":21,"statuses_count":2445,"created_at":"Wed
> Feb 18 01:53:03 +0000
> 2009","utc_offset":-18000,"time_zone":"Quito","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/821443790\/958deacf216c26cd1a8e60fee00a2323.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/821443790\/958deacf216c26cd1a8e60fee00a2323.jpeg","profile_background_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/588079658819653632\/wwdO4Eqv_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/588079658819653632\/wwdO4Eqv_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/21157904\/1402319779","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"retweet_count":1,"favorite_count":2,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/m79Enjzzqx","expanded_url":"http:\/\/bit.ly\/1VTuSAh","display_url":"bit.ly\/1VTuSAh","indices":[88,111]},{"url":"https:\/\/t.co\/mraolX1MZv","expanded_url":"https:\/\/twitter.com\/SenatorBurr\/status\/742008835993444352","display_url":"twitter.com\/SenatorBurr\/st\u2026","indices":[112,135]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"quoted_status_id":742008835993444352,"quoted_status_id_str":"742008835993444352","quoted_status":{"created_at":"Sun
> Jun 12 15:01:07 +0000
> 2016","id":742008835993444352,"id_str":"742008835993444352","text":"My
> thoughts and prayers are with the victims of this morning's horrific attack
> in Orlando and their loved ones.","source":"\u003ca
> href=\"http:\/\/twitter.com\/download\/iphone\"
> rel=\"nofollow\"\u003eTwitter for
> iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":21157904,"id_str":"21157904","name":"Richard
> Burr","screen_name":"SenatorBurr","location":"Winston-Salem,
> NC","url":"http:\/\/burr.senate.gov","description":"U.S. Senator from North
> Carolina, Chairman of the Senate Select Committee on
> Intelligence","protected":false,"verified":true,"followers_count":37701,"friends_count":1988,"listed_count":2175,"favourites_count":21,"statuses_count":2445,"created_at":"Wed
> Feb 18 01:53:03 +0000
> 2009","utc_offset":-18000,"time_zone":"Quito","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/821443790\/958deacf216c26cd1a8e60fee00a2323.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/821443790\/958deacf216c26cd1a8e60fee00a2323.jpeg","profile_background_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/588079658819653632\/wwdO4Eqv_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/588079658819653632\/wwdO4Eqv_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/21157904\/1402319779","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/m79Enjzzqx","expanded_url":"http:\/\/bit.ly\/1VTuSAh","display_url":"bit.ly\/1VTuSAh","indices":[99,122]},{"url":"https:\/\/t.co\/mraolX1MZv","expanded_url":"https:\/\/twitter.com\/SenatorBurr\/status\/742008835993444352","display_url":"twitter.com\/SenatorBurr\/st\u2026","indices":[123,140]}],"user_mentions":[{"screen_name":"NC_N8","name":"(((Nate)))","id":19185921,"id_str":"19185921","indices":[3,9]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1465755358624"}
>
> How do you split this kind of (JSON?) file into  separate flowfiles?
>
> Thanks!
>
> pat
> ( ͡° ͜ʖ ͡°)
>
> "A wise man can learn more from a foolish question than a fool can learn
> from a wise answer". ~ Bruce Lee.

Reply via email to