Repository: asterixdb Updated Branches: refs/heads/master 1d8de29a3 -> 0aa650eb9
[NO ISSUE][DOC] Documentation update for Feed - user model changes: no - storage format changes: no - interface changes: no Details: - Changed to SQL++ syntax. - Updated all syntax to the current master version. - Removed RSS feed which is not used. - Validated all examples locally. Change-Id: I5ddc0fb3eabd6dcf37646ec0e48647d87c2bb3b2 Reviewed-on: https://asterix-gerrit.ics.uci.edu/2520 Sonar-Qube: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Contrib: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Reviewed-by: Till Westmann <ti...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/0aa650eb Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/0aa650eb Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/0aa650eb Branch: refs/heads/master Commit: 0aa650eb96a9801a424e7d85d54a5eb535a5415f Parents: 1d8de29 Author: Xikui Wang <xkk...@gmail.com> Authored: Thu Apr 19 16:04:16 2018 -0700 Committer: Xikui Wang <xkk...@gmail.com> Committed: Fri Apr 27 16:29:51 2018 -0700 ---------------------------------------------------------------------- .../src/site/markdown/feeds/tutorial.md | 204 +++++++------------ 1 file changed, 78 insertions(+), 126 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/asterixdb/blob/0aa650eb/asterixdb/asterix-doc/src/site/markdown/feeds/tutorial.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/feeds/tutorial.md b/asterixdb/asterix-doc/src/site/markdown/feeds/tutorial.md index d2d4488..f5635b8 100644 --- a/asterixdb/asterix-doc/src/site/markdown/feeds/tutorial.md +++ b/asterixdb/asterix-doc/src/site/markdown/feeds/tutorial.md @@ -23,7 +23,7 @@ * [Introduction](#Introduction) * [Feed Adapters](#FeedAdapters) -<!-- * [Feed Policies](#FeedPolicies) --> +* [Feed Policies](#FeedPolicies) ## <a name="Introduction">Introduction</a> ## @@ -61,11 +61,13 @@ that cover the popular scenarios of ingesting data from (a) Twitter (b) RSS (c) ####Ingesting Twitter Stream We shall use the built-in push-based Twitter adapter. As a pre-requisite, we must define a Tweet using the AsterixDB Data Model (ADM) -and the AsterixDB Query Language (AQL). Given below are the type definitions in AQL +and the query language SQL++. Given below are the type definitions in SQL++ that create a Tweet datatype which is representative of a real tweet as obtained from Twitter. + drop dataverse feeds if exists; + create dataverse feeds; - use dataverse feeds; + use feeds; create type TwitterUser as closed { screen_name: string, @@ -77,13 +79,12 @@ that create a Tweet datatype which is representative of a real tweet as obtained create type Tweet as open { id: int64, user: TwitterUser - } + }; - create dataset Tweets (Tweet) - primary key id; + create dataset Tweets (Tweet) primary key id; We also create a dataset that we shall use to persist the tweets in AsterixDB. -Next we make use of the `create feed` AQL statement to define our example data feed. +Next we make use of the `create feed` SQL++ statement to define our example data feed. #####Using the "push_twitter" feed adapter##### The "push_twitter" adapter requires setting up an application account with Twitter. To retrieve @@ -91,6 +92,7 @@ tweets, Twitter requires registering an application. Registration involves provi a name and a brief description for the application. Each application has associated OAuth authentication credentials that include OAuth keys and tokens. Accessing the Twitter API requires providing the following. + 1. Consumer Key (API Key) 2. Consumer Secret (API Secret) 3. Access Token @@ -101,18 +103,20 @@ parameters. End users are required to obtain the above authentication credential using the "push_twitter" adapter. For further information on obtaining OAuth keys and tokens and registering an application with Twitter, please visit http://apps.twitter.com -Given below is an example AQL statement that creates a feed called "TwitterFeed" by using the +Given below is an example SQL++ statement that creates a feed called "TwitterFeed" by using the "push_twitter" adapter. - use dataverse feeds; + use feeds; - create feed TwitterFeed if not exists using "push_twitter" - (("type-name"="Tweet"), - ("format"="twitter-status"), - ("consumer.key"="************"), - ("consumer.secret"="**************"), - ("access.token"="**********"), - ("access.token.secret"="*************")); + create feed TwitterFeed with { + "adapter-name": "push_twitter", + "type-name": "Tweet", + "format": "twitter-status", + "consumer.key": "************", + "consumer.secret": "************", + "access.token": "**********", + "access.token.secret": "*************" + }; It is required that the above authentication parameters are provided valid. Note that the `create feed` statement does not initiate the flow of data from Twitter into @@ -122,23 +126,25 @@ to a target dataset using the connect feed statement and activated using the sta The Twitter adapter also supports several Twitter streaming APIs as follow: -1. Track filter ("keywords"="AsterixDB, Apache") -2. Locations filter ("locations"="-29.7, 79.2, 36.7, 72.0; -124.848974,-66.885444, 24.396308, 49.384358") -3. Language filter ("language"="en") -4. Filter level ("filter-level"="low") +1. Track filter `"keywords": "AsterixDB, Apache"` +2. Locations filter `"locations": "-29.7, 79.2, 36.7, 72.0; -124.848974,-66.885444, 24.396308, 49.384358"` +3. Language filter `"language": "en"` +4. Filter level `"filter-level": "low"` An example of Twitter adapter tracking tweets with keyword "news" can be described using following ddl: - use dataverse feeds; - - create feed TwitterFeed if not exists using "push_twitter" - (("type-name"="Tweet"), - ("format"="twitter-status"), - ("consumer.key"="************"), - ("consumer.secret"="**************"), - ("access.token"="**********"), - ("access.token.secret"="*************"), - ("keywords"="news")); + use feeds; + + create feed TwitterFeed with { + "adapter-name": "push_twitter", + "type-name": "Tweet", + "format": "twitter-status", + "consumer.key": "************", + "consumer.secret": "************", + "access.token": "**********", + "access.token.secret": "*************", + "keywords": "news" + }; For more details about these APIs, please visit https://dev.twitter.com/streaming/overview/request-parameters @@ -154,7 +160,7 @@ Multiple feeds can simultaneously be connected to a dataset such that the contents of the dataset represent the union of the connected feeds. Also one feed can be simultaneously connected to multiple target datasets. - use dataverse feeds; + use feeds; connect feed TwitterFeed to dataset Tweets; @@ -170,108 +176,53 @@ to connect TwitterFeed to a different dataset. Let the feed run for a minute, then run the following query to see the latest tweets that are stored into the data set. - use dataverse feeds; + use feeds; - for $i in dataset Tweets limit 10 return $i; + select * from Tweets limit 10; The dataflow of data from a feed can be terminated explicitly by `stop feed` statement. - use dataverse feeds; + use feeds; stop feed TwitterFeed; The `disconnnect statement` can be used to disconnect the feed from certain dataset. - use dataverse feeds; + use feeds; disconnect feed TwitterFeed from dataset Tweets; ###Ingesting with Other Adapters AsterixDB has several builtin feed adapters for data ingestion. User can also implement their own adapters and plug them into AsterixDB. -Here we introduce `rss_feed`, `socket_adapter` and `localfs` +Here we introduce `socket_adapter` and `localfs` feed adapter that cover most of the common application scenarios. -#####Using the "rss_feed" feed adapter##### -`rss_feed` adapter allows retrieving data given a collection of RSS end point URLs. -As observed in the case of ingesting tweets, it is required to model an RSS data item using AQL. - - use dataverse feeds; - - create type Rss if not exists as open { - id: string, - title: string, - description: string, - link: string - }; - - create dataset RssDataset (Rss) - primary key id; - -Next, we define an RSS feed using our built-in adapter "rss_feed". - - use dataverse feeds; - - create feed my_feed using - rss_feed ( - ("type-name"="Rss"), - ("format"="rss"), - ("url"="http://rss.cnn.com/rss/edition.rss") - ); - -In the above definition, the configuration parameter "url" can be a comma-separated list that reflects a -collection of RSS URLs, where each URL corresponds to an RSS endpoint or an RSS feed. -The "rss_feed" retrieves data from each of the specified RSS URLs (comma separated values) in parallel. - -The following statements connect the feed into the `RssDataset`: - - use dataverse feeds; - - connect feed my_feed to dataset RssDataset; - -The following statements activate the feed and start the dataflow: - - use dataverse feeds; - - start feed my_feed; - -The following statements show the latest data from the data set, stop the feed, and -disconnect the feed from the data set. - - use dataverse feeds; - - for $i in dataset RssDataset limit 10 return $i; - - stop feed my_feed - - disconnect feed my_feed from dataset RssDataset; - - #####Using the "socket_adapter" feed adapter##### `socket_adapter` feed opens a web socket on the given node which allows user to push data into AsterixDB directly. Here is an example: drop dataverse feeds if exists; create dataverse feeds; - use dataverse feeds; + use feeds; create type TestDataType as open { screenName: string - } + }; create dataset TestDataset(TestDataType) primary key screenName; - create feed TestSocketFeed using socket_adapter - ( - ("sockets"="127.0.0.1:10001"), - ("address-type"="IP"), - ("type-name"="TestDataType"), - ("format"="adm") - ); + create feed TestSocketFeed with { + "adapter-name": "socket_adapter", + "sockets": "127.0.0.1:10001", + "address-type": "IP", + "type-name": "TestDataType", + "format": "adm" + }; connect feed TestSocketFeed to dataset TestDataset; - use dataverse feeds; + use feeds; start feed TestSocketFeed; The above statements create a socket feed which is listening to "10001" port of the host machine. This feed accepts data @@ -297,25 +248,27 @@ by line into the socket feed using any socket client you like. Following is a so `localfs` adapter enables data ingestion from local file system. It allows user to feed data records on local disk into a dataset. A DDL example for creating a `localfs` feed is given as follow: - use dataverse feeds; + use feeds; - create type TweetType as closed { - id: string, - username : string, - location : string, - text : string, - timestamp : string - } + create type TestDataType as open { + screenName: string + }; - create dataset Tweets(TweetType) - primary key id; + create dataset TestDataset(TestDataType) primary key screenName; + + create feed TestFileFeed with { + "adapter-name": "localfs", + "type-name": "TestDataType", + "path": "HOSTNAME://LOCAL_FILE_PATH", + "format": "adm" + }; - create feed TweetFeed - using localfs - (("type-name"="TweetType"),("path"="HOSTNAME://LOCAL_FILE_PATH"),("format"="adm")) + connect feed TestFileFeed to dataset TestDataset; + + start feed TestFileFeed; Similar to previous examples, we need to define the datatype and dataset this feed uses. -The "path" parameter refers to the local datafile that we want to ingest data from. +The "path" parameter refers to the local data file that we want to ingest data from. `HOSTNAME` can either be the IP address or node name of the machine which holds the file. `LOCAL_FILE_PATH` indicates the absolute path to the file on that machine. Similarly to `socket_adapter`, this feed takes `adm` formatted data records. @@ -334,7 +287,7 @@ define a datatype with the primary key field, and specify that field to be autog Use that same datatype in feed definition will cause a type discrepancy since there is no such field in the datasource. Thus, we will need to define two separate datatypes for feed and dataset: - use dataverse feeds; + use feeds; create type DBLPFeedType as closed { dblpid: string, @@ -352,13 +305,13 @@ Thus, we will need to define two separate datatypes for feed and dataset: } create dataset DBLPDataset(DBLPDataSetType) primary key id autogenerated; - create feed DBLPFeed using socket_adapter - ( - ("sockets"="127.0.0.1:10001"), - ("address-type"="IP"), - ("type-name"="DBLPFeedType"), - ("format"="adm") - ); + create feed DBLPFeed with { + "adapter-name": "socket_adapter", + "sockets": "127.0.0.1:10001", + "address-type": "IP", + "type-name": "DBLPFeedType", + "format": "adm" + }; connect feed DBLPFeed to dataset DBLPDataset; @@ -403,7 +356,6 @@ spillage crosses a configured threshold. In all cases, the desired ingestion policy is specified as part of the `connect feed` statement or else the "Basic" policy will be chosen as the default. - use dataverse feeds; + use feeds; - connect feed TwitterFeed to dataset Tweets - using policy Basic; \ No newline at end of file + connect feed TwitterFeed to dataset Tweets using policy Basic; \ No newline at end of file