[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169833#comment-15169833
 ] 

Varun Saxena commented on YARN-3863:
------------------------------------

As the patch is quite large, to aid in review, I will jot down what has been 
done in this JIRA.

# The intention is to convert filters which were represented as maps or sets to 
TimelineFilterList which would help in complex filters being supported.
i.e. Let me take example of config filters in the format {{cfg1=value1, 
cfg2=value2, cfg3=value3, cfg4=value4}} which means all the key value pairs 
should be matched for an entity. With work in this JIRA,we can support complex 
filters such as {{(cfg1=value1 OR cfg2=value2) AND (cfg3 \!=value3 AND 
cfg4\!=value4)}}. 
# Similarly current metric filters just check if a certain metric exists for an 
entity or not, and does not compare against metric values, for instance, 
{{metric1 >= 40}}. This will be supported now.

Now coming to code, 
* *TimelineEntityFilters.java*
Filter representation has been changed. Now all filters will be represented as 
a TImelineFilterList to support complex filters with ANDs' and ORs'. What kind 
of filters each filter list will hold, well more on that in the next point.

* *TimelinexxxFilter.java*
I have added 4 new filter classes here. All these filter classes can be put 
inside a TimelineFilterList to construct complex filters using ANDs' and ORs'.
All these filters will then be converted to HBase Filters in HBase 
implementation.
*TimelineEqualityFilter* is meant to match key value pairs. This will be used 
to represent config and info filters. Key and value can either be equal to or 
not equal to.
*TimelineMultiValEqualityFilter* is to match key and a list/set of values. 
These values will be a subset of what each entity must contain. This is used to 
match relations(relatesTo and isRelatedTo). For instance, if we specify 
entitytype=id1,id2,id3, this means for each entity we will check if in 
relations specified, id1,id2 and id3 exist for the entitytype. It would not 
matter if other ids'(within the scope of entity type) are specified as 
relations for the entity.
*TimelineCompareFilter* - As the name suggests, it is used for comparison. This 
is used to represent metric filters. All relational operators such as =, !=, >, 
>=, < and <=.
*TimelineExistsFilter* - This checks if the value exists. Used for event 
filters to represent if an event exists. Transformed into HBase's column 
qualifier filter.

* *xxxEntityReader.java*
These classes are meant to read from different tables from HBase backend. These 
classes contain the primary changes for HBase implementation.
I had focused on adding ample comments in code for this part but still as its 
important, I will explain it as well.
Basically we create HBase filter list based on fields, configs and metrics to 
retrieve(done in YARN-3862) and a filter list based on filters, which is done 
in this JIRA.
*TimelineEntityReader* -  In this class we introduce a new abstract method 
{{constructFilterListBasedOnFilters}} which will be implemented by derived 
classes to create a filter list based on filters. For single entity read, a 
filter list based on filters does not make any sense so the filter list created 
will only be based on fields. For multiple entity reads though, we will create 
a new filter list containing filters and fields together.
*GenericEntityReader* - The changes here are meant for entity table. And some 
common code which is used by ApplicationEntityReader as well.
In {{constructFilterListBasedOnFilters}}, HBase filter lists are created for 
created time range, config, info and metric filters.
Relation based filters and event filters cannot be directly added here because 
of the way events and relations are stored in the backend HBase storage. That 
is, we cannot apply a SingleColumnValueFilter to filter out rows.
So for them, we add filters to fetch only the columns which we require to match 
these filters locally. This is only done if these fields are not mentioned in 
fieldsToRetrieve. 
For example, if I have event filters coming as (event1, event2) and fields to 
retrieve does not mention EVENTS or ALL, I will read all event columns 
corresponding to event1 and event2, for the filtered rows.
This will reduce amount of data retrieved from backend. Especially for events, 
because number of events can  be quite a few.
Code for this is in method {{constructFilterListBasedOnFields}}.
Now coming to the new methods which have been added.
{{fetchPartialColsFromInfoFamily}} checks if we need to get some of the columns 
for relations and events from backend. This depends on the condition explained 
above.
{{createFilterListForColsOfInfoFamily}}, is called if above condition is true. 
Here the idea is to add each column under INFO column family which is in fields 
to retrieve or has to be added because we want to match certain relation 
filters or event filters. So we add QualifierFilter for each column(done in 
{{updateFixedColumns}}) and also add qualifier filters for info, relations and 
events, all of which come under INFO column family. Here qualifier filters for 
relations and events are created based on whether we want to fetch only some of 
the column qualifier based on relations/events or we can fetch all 
relations/events using column prefix(check if-else in the method).
{{excludeFieldsFromInfoColFamily}} is called if 
{{fetchPartialColsFromInfoFamily}} returns false. Here we exclude certain 
columns(based on column prefix) from INFO column family which we do not want 
based on fields to retrieve. For eg :  If fields to retrieve specified all 
fields except events, this effectively will produce a filter list like 
{{(FamilyFilter = INFO) AND (QualifierFilter != Event Prefix)}}
{{updateFilterForConfsAndMetricsToRetrieve}} is for configs and metrics to 
retrieve. I have separated this out to a new method and made minor changes in 
its implementation over YARN-3862. You can check the comments there.
Moreover, there are changes to {{parseEntity}} which is primarily to remove 
code related to matching filters locally. We no longer need to match created 
time range, info, metric and config filters locally as HBase will filter out 
rows based on these filters now. Relation and event filters will still be 
matched locally. Relevant comments have been added here.
*ApplicationEntityReader* - The changes here are exactly the same as Generic 
Entity Reader expect that they are being done for application table columns and 
column prefixes. I had moved some of the methods which had similar 
implementation to Generic entity reader(which is base class of Application 
entity reader) to reduce code size. There is one comment on it from Sangjin. So 
some code from Generic Entity reader will have to be moved back here as well 
for clear separation.
*FlowRunEntityReader* will have only metric filters. These were not supported 
so far and have now been supported. In flow run table, metrics reside under 
INFO column family and not in METRIC column family as was the case above. 
Changes are along similar lines though. In constructFilterListBasedOnFilters we 
add filters for start time range and metric filters. In 
constructFilterListBasedOnFields, some changes have been made to correct the 
logic. Comments there should be self explanatory.
*FlowActivityEntityReader* - No filters are applied for flow activity table so 
*constructFilterListBasedOnFilters* merely returns null.

* *TimelineFilterUtils.java*
This class contains a set of utility methods to convert timeline filters into 
HBase filters which we will then add to HBase Get/Scan for querying from 
backend.
The new methods added are as under :
{{fetchColumnsFromFilterList}} - Here we fetch a list of columns from the 
filter list based on keys in TimelineMultiValEqualityFilter and 
TimelineExistsFilter. This is used for relation filters and event filters. We 
use it to decide which columns to fetch from backend to match relation and 
event filters. Refer to details in xxxEntityReader.java
Based on columns decided from method above, 
{{createFiltersFromColumnQualifiers}} is called to create qualifier filters for 
the relevant columns to be fetched.
{{createHBaseSingleColValueFilter}} as the name suggests is to create a 
SingleColumnValueFilter. This is called for compare and equality filters and 
called from {{createHBaseFilterList}} method.
{{createSingleColValueFiltersByRange}} is to create single column value filters 
by range. Used for created time range. As sangjin said, this method can reuse 
code in createHBaseSingleColValueFilter. Will do so in next patch.

* *TimelineReaderWebServicesUtils.java*
This utility class contains helpers to parse different fields coming in REST 
API i.e. parsing filters, etc. Most of the changes here are stop gap changes 
till YARN-4447 is done and have been made to make the REST API code consistent 
with code added in this JIRA. There have been a couple of comments related to 
this. So let me explain the changes.
*metricfilters* are converted to a list of {{TimelineCompareFilter}} filters. 
Currently we just check existence of a metric for metric filters. But after 
this JIRA we will compare values of these metrics too(in storage layer). As in 
the REST layer we get only a list of metrics which we check for existence and 
in storage layer have code which does comparison of values, hence I create a 
compare filter with {{metric >= 0}}. This will be equivalent to checking if a 
metric exists. Once YARN-4447 goes in, we will support all comparison operators 
at REST layer.
*eventfilters* are converted to a list of {{TimelineExistsFilter}} filters. 
Comparison operator is kept as EQUALS and not equals will be supported after 
YARN-4447.
*config and info filters* are converted to a list of {{TimelineEqualityFilter}} 
filters. Same as above, not equals will be supported after YARN-4447.
*relation filters* are converted to a list of 
{{TimelineMultiValEqualityFilter}} filters. Same as above, not equals will be 
supported after YARN-4447.

* *xxxColumn.java, xxxColumnPrefix.java*
Some helper methods(such as getColumnFamilyBytes, getValueConverter,etc.) added 
to be used for filter implementation.
getCompoundColQualBytes will be required for events. Will add relevant code in 
next patch.

cc [~sjlee0], [~djp], [~vrushalic].
I hope this info will help in review.

> Support complex filters in TimelineReader
> -----------------------------------------
>
>                 Key: YARN-3863
>                 URL: https://issues.apache.org/jira/browse/YARN-3863
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to