[ https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169833#comment-15169833 ]
Varun Saxena commented on YARN-3863: ------------------------------------ As the patch is quite large, to aid in review, I will jot down what has been done in this JIRA. # The intention is to convert filters which were represented as maps or sets to TimelineFilterList which would help in complex filters being supported. i.e. Let me take example of config filters in the format {{cfg1=value1, cfg2=value2, cfg3=value3, cfg4=value4}} which means all the key value pairs should be matched for an entity. With work in this JIRA,we can support complex filters such as {{(cfg1=value1 OR cfg2=value2) AND (cfg3 \!=value3 AND cfg4\!=value4)}}. # Similarly current metric filters just check if a certain metric exists for an entity or not, and does not compare against metric values, for instance, {{metric1 >= 40}}. This will be supported now. Now coming to code, * *TimelineEntityFilters.java* Filter representation has been changed. Now all filters will be represented as a TImelineFilterList to support complex filters with ANDs' and ORs'. What kind of filters each filter list will hold, well more on that in the next point. * *TimelinexxxFilter.java* I have added 4 new filter classes here. All these filter classes can be put inside a TimelineFilterList to construct complex filters using ANDs' and ORs'. All these filters will then be converted to HBase Filters in HBase implementation. *TimelineEqualityFilter* is meant to match key value pairs. This will be used to represent config and info filters. Key and value can either be equal to or not equal to. *TimelineMultiValEqualityFilter* is to match key and a list/set of values. These values will be a subset of what each entity must contain. This is used to match relations(relatesTo and isRelatedTo). For instance, if we specify entitytype=id1,id2,id3, this means for each entity we will check if in relations specified, id1,id2 and id3 exist for the entitytype. It would not matter if other ids'(within the scope of entity type) are specified as relations for the entity. *TimelineCompareFilter* - As the name suggests, it is used for comparison. This is used to represent metric filters. All relational operators such as =, !=, >, >=, < and <=. *TimelineExistsFilter* - This checks if the value exists. Used for event filters to represent if an event exists. Transformed into HBase's column qualifier filter. * *xxxEntityReader.java* These classes are meant to read from different tables from HBase backend. These classes contain the primary changes for HBase implementation. I had focused on adding ample comments in code for this part but still as its important, I will explain it as well. Basically we create HBase filter list based on fields, configs and metrics to retrieve(done in YARN-3862) and a filter list based on filters, which is done in this JIRA. *TimelineEntityReader* - In this class we introduce a new abstract method {{constructFilterListBasedOnFilters}} which will be implemented by derived classes to create a filter list based on filters. For single entity read, a filter list based on filters does not make any sense so the filter list created will only be based on fields. For multiple entity reads though, we will create a new filter list containing filters and fields together. *GenericEntityReader* - The changes here are meant for entity table. And some common code which is used by ApplicationEntityReader as well. In {{constructFilterListBasedOnFilters}}, HBase filter lists are created for created time range, config, info and metric filters. Relation based filters and event filters cannot be directly added here because of the way events and relations are stored in the backend HBase storage. That is, we cannot apply a SingleColumnValueFilter to filter out rows. So for them, we add filters to fetch only the columns which we require to match these filters locally. This is only done if these fields are not mentioned in fieldsToRetrieve. For example, if I have event filters coming as (event1, event2) and fields to retrieve does not mention EVENTS or ALL, I will read all event columns corresponding to event1 and event2, for the filtered rows. This will reduce amount of data retrieved from backend. Especially for events, because number of events can be quite a few. Code for this is in method {{constructFilterListBasedOnFields}}. Now coming to the new methods which have been added. {{fetchPartialColsFromInfoFamily}} checks if we need to get some of the columns for relations and events from backend. This depends on the condition explained above. {{createFilterListForColsOfInfoFamily}}, is called if above condition is true. Here the idea is to add each column under INFO column family which is in fields to retrieve or has to be added because we want to match certain relation filters or event filters. So we add QualifierFilter for each column(done in {{updateFixedColumns}}) and also add qualifier filters for info, relations and events, all of which come under INFO column family. Here qualifier filters for relations and events are created based on whether we want to fetch only some of the column qualifier based on relations/events or we can fetch all relations/events using column prefix(check if-else in the method). {{excludeFieldsFromInfoColFamily}} is called if {{fetchPartialColsFromInfoFamily}} returns false. Here we exclude certain columns(based on column prefix) from INFO column family which we do not want based on fields to retrieve. For eg : If fields to retrieve specified all fields except events, this effectively will produce a filter list like {{(FamilyFilter = INFO) AND (QualifierFilter != Event Prefix)}} {{updateFilterForConfsAndMetricsToRetrieve}} is for configs and metrics to retrieve. I have separated this out to a new method and made minor changes in its implementation over YARN-3862. You can check the comments there. Moreover, there are changes to {{parseEntity}} which is primarily to remove code related to matching filters locally. We no longer need to match created time range, info, metric and config filters locally as HBase will filter out rows based on these filters now. Relation and event filters will still be matched locally. Relevant comments have been added here. *ApplicationEntityReader* - The changes here are exactly the same as Generic Entity Reader expect that they are being done for application table columns and column prefixes. I had moved some of the methods which had similar implementation to Generic entity reader(which is base class of Application entity reader) to reduce code size. There is one comment on it from Sangjin. So some code from Generic Entity reader will have to be moved back here as well for clear separation. *FlowRunEntityReader* will have only metric filters. These were not supported so far and have now been supported. In flow run table, metrics reside under INFO column family and not in METRIC column family as was the case above. Changes are along similar lines though. In constructFilterListBasedOnFilters we add filters for start time range and metric filters. In constructFilterListBasedOnFields, some changes have been made to correct the logic. Comments there should be self explanatory. *FlowActivityEntityReader* - No filters are applied for flow activity table so *constructFilterListBasedOnFilters* merely returns null. * *TimelineFilterUtils.java* This class contains a set of utility methods to convert timeline filters into HBase filters which we will then add to HBase Get/Scan for querying from backend. The new methods added are as under : {{fetchColumnsFromFilterList}} - Here we fetch a list of columns from the filter list based on keys in TimelineMultiValEqualityFilter and TimelineExistsFilter. This is used for relation filters and event filters. We use it to decide which columns to fetch from backend to match relation and event filters. Refer to details in xxxEntityReader.java Based on columns decided from method above, {{createFiltersFromColumnQualifiers}} is called to create qualifier filters for the relevant columns to be fetched. {{createHBaseSingleColValueFilter}} as the name suggests is to create a SingleColumnValueFilter. This is called for compare and equality filters and called from {{createHBaseFilterList}} method. {{createSingleColValueFiltersByRange}} is to create single column value filters by range. Used for created time range. As sangjin said, this method can reuse code in createHBaseSingleColValueFilter. Will do so in next patch. * *TimelineReaderWebServicesUtils.java* This utility class contains helpers to parse different fields coming in REST API i.e. parsing filters, etc. Most of the changes here are stop gap changes till YARN-4447 is done and have been made to make the REST API code consistent with code added in this JIRA. There have been a couple of comments related to this. So let me explain the changes. *metricfilters* are converted to a list of {{TimelineCompareFilter}} filters. Currently we just check existence of a metric for metric filters. But after this JIRA we will compare values of these metrics too(in storage layer). As in the REST layer we get only a list of metrics which we check for existence and in storage layer have code which does comparison of values, hence I create a compare filter with {{metric >= 0}}. This will be equivalent to checking if a metric exists. Once YARN-4447 goes in, we will support all comparison operators at REST layer. *eventfilters* are converted to a list of {{TimelineExistsFilter}} filters. Comparison operator is kept as EQUALS and not equals will be supported after YARN-4447. *config and info filters* are converted to a list of {{TimelineEqualityFilter}} filters. Same as above, not equals will be supported after YARN-4447. *relation filters* are converted to a list of {{TimelineMultiValEqualityFilter}} filters. Same as above, not equals will be supported after YARN-4447. * *xxxColumn.java, xxxColumnPrefix.java* Some helper methods(such as getColumnFamilyBytes, getValueConverter,etc.) added to be used for filter implementation. getCompoundColQualBytes will be required for events. Will add relevant code in next patch. cc [~sjlee0], [~djp], [~vrushalic]. I hope this info will help in review. > Support complex filters in TimelineReader > ----------------------------------------- > > Key: YARN-3863 > URL: https://issues.apache.org/jira/browse/YARN-3863 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3863-YARN-2928.v2.01.patch, > YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, > YARN-3863-feature-YARN-2928.wip.01.patch, > YARN-3863-feature-YARN-2928.wip.02.patch, > YARN-3863-feature-YARN-2928.wip.04.patch, > YARN-3863-feature-YARN-2928.wip.05.patch > > > Currently filters in timeline reader will return an entity only if all the > filter conditions hold true i.e. only AND operation is supported. We can > support OR operation for the filters as well. Additionally as primary backend > implementation is HBase, we can design our filters in a manner, where they > closely resemble HBase Filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)