The data sets are online transactions. For each one, we label it as "fraud"
or "good". This is a binary classification problem. With decisionTree, we
can identify those combined conditions that are likely to trigger a
"fraud". I am willing to hear advice.

The features include:
transaction amount, time stamp, product_category, risk_score, city,
country, and fraud_flag.

Most transactions are "good", say, we have 1 million transactions in total,
and only 1 thousand are detected as "fraud".

We want to find out the optimal threshold values of "risk_score"
corresponding to each top compromised cities and/or product_categories,
which are clusters of fraud transactions. We want to minimize the fraud
rate, and maximize the total sales volume.

We are most interested to find out the decision rules leading to clusters
of leaf node with
fraud rate= fraud_sales/total_sales >= 20%

I am looking at DecisionTreeClassifier
<http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html>
:
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

Because we want to extract rules, it is not feasible to build a complicated
decisionTree. I set up max_depth=4.

What is the right strategy to set up the class_weight?

> *class_weight* : dict, list of dicts, “auto” or None, optional
> (default=None)
>
> Weights associated with classes in the form {class_label: weight}... For 
> *multi-output
> *problems, a list of dicts can be provided in the same order as the
> columns of y.
>
I want to output in each leaf node with both

[number of fraud, number of good transactions], and [fraud sales volume,
good sales volume]

Should I use list of dicts for class_weight? e.g.

class_weight=[{0:1, 1:1}, {0:some_weight_need_to_be figured_out,
1:}other_weight]


Any tips are greatly welcome!


Best regards,
Rex
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to