I don't think the question is representation as double. The question is how this output represents a label? This looks like the result of an annotator. What are you classifying? you need, first, ground truth and prediction somewhere to use any utility to assess classification metrics.
On Mon, Oct 25, 2021 at 5:42 AM <mar...@wunderlich.com> wrote: > Hello, > > I am using SparkNLP to do some NER. The result datastructure after > training and classification is a Dataset<Row>, with one column each for > labels and predictions. For evaluating the model, I would like to use the > Spark ML class > org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator. However, > this evaluator expects labels as double numbers. In the case of an NER > task, the results in my case are of type > array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>. > > > It would be possible, of course, to convert this format to the required > doubles. But is there a way to easily apply > MulticlassClassificationEvaluator to the NER task or is there maybe a > better evaluator? I haven't found anything yet (neither in Spark ML nor in > SparkNLP). > > Thanks a lot. > > Cheers, > > Martin >