I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic
regression took 85 minutes and linear regression 127 seconds…
My dataset as I said is 128 MB and contains: 1000 features and ~100 classes.
#SparkSession
ss = SparkSession.builder.getOrCreate()
start = time.time()
#Read data
trainData = ss.read.format("csv").option("inferSchema","true").load(file)
#Calculate Features
assembler = VectorAssembler(inputCols=trainData.columns[1:],
outputCol="features")
trainData = assembler.transform(trainData)
#Drop columns
dropColumns = trainData.columns
dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
trainData = trainData.drop(*dropColumns)
#Rename column from _c0 to label
trainData = trainData.withColumnRenamed("_c0", "label")
#Logistic regression
lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(trainData)
#Output Coefficients
print("Coefficients: " + str(lrModel.coefficientMatrix))
- Thodoris
> On 27 Apr 2018, at 22:50, Irving Duran <[email protected]> wrote:
>
> Are you reformatting the data correctly for logistic (meaning 0 & 1's) before
> modeling? What are OS and spark version you using?
>
> Thank You,
>
> Irving Duran
>
>
> On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <[email protected]
> <mailto:[email protected]>> wrote:
> Hello,
>
> I am running an experiment to test logistic and linear regression on spark
> using MLlib.
>
> My dataset is only 128MB and something weird happens. Linear regression takes
> about 127 seconds either with 1 or 500 iterations. On the other hand,
> logistic regression most of the times does not manage to finish either with 1
> iteration. I usually get memory heap error.
>
> In both cases I use the default cores and memory for driver and I spawn 1
> executor with 1 core and 2GBs of memory.
>
> Except that, I get a warning about NativeBLAS. I searched in the Internet and
> I found that I have to install libgfortran. Even if I did it the warning
> remains.
>
> Any ideas for the above?
>
> Thank you,
> - Thodoris
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
> <mailto:[email protected]>
>