Hi Ram,
Yes, I complete agree. An exception is poor way to handle this case, and
training on a dataset of zero labels and no one labels should simply work
without exceptions.
Fortunately, it looks like someone else has recently patched the problem
with LogisticRegression:
Hi Ram,
I didn't include an explicit label column in my reproduction as I thought
it superfluous. However, in my original use-case, I was using a
StringIndexer, where the labels were indexed across the entire dataset
(training+validation+test). The (indexed) label column was then explicitly
Hi again Ram,
Sorry, I was too hasty in my previous response. I've done a bit more
digging through the code, and StringIndexer does indeed provide metadata,
as a NominalAttribute with a known number of class labels. I don't think
the issue is related to the use of metadata, however.
It seems
Hey David, Yeah absolutely!, feel free to create a JIRA and attach your
patch to it. We can help review it and pull in the fix... happy to accept
contributions!
ccing Joseph who is one of the maintainers of MLLib as well.. when creating
the JIRA can you attach a simple test case?
On Tue, Jan 26,
Hi David
If I am reading the email right, there are two problems here right?
a) for rare classes the random split will likely miss the rare class.
b) if it misses the rare class an exception is thrown
I thought the exception stems from b), is that right?... i wouldn't expect
an exception to be
Hi Ram, Joseph,
That's right, but I will clarify:
(a) a random split can generate a training set that does not contain some
rare class
(b) when LogisticRegression is run over a dataframe where all instances
have the same class label, it throws an ArrayIndexOutOfBoundsException.
When (a) occurs,
Hey David
In your scenario, OneVsRest is training a classifier for 1 vs not 1... and
the input dataset for fit (or train) has labeled data for label 1
But the underlying binary classifier (LogisticRegression) uses sampling to
determine the subset of data to sample during each iteration and it is
btw, OneVsRest is using the labels in the dataset that is fed to the fit
method, in case the metadata is missing.
So if the metadata contains a label, we expect that label to be present in
the dataset passed to the fit method.
If you want OneVsRest to compute the labels you can leave the label
Hi David
What happens if you provide the class labels via metadata instead of
letting OneVsRest determine the labels?
Ram
On Mon, Jan 25, 2016 at 3:06 PM, David Brooks wrote:
> Hi,
>
> I've run into an exception using MLlib OneVsRest with logistic regression
> (v1.6.0, but
Hi,
I've run into an exception using MLlib OneVsRest with logistic regression
(v1.6.0, but also in previous versions).
The issue is intermittent. When running multiclass classification with
K-fold cross validation, there are scenarios where the split does not
contain instances for every target
10 matches
Mail list logo