Thanks, 8/10 coeff are zero estimate in CRUZADAS, the parameters for alpha and lambda are set in default(i think zero, the model in R and SAS was fitted using glm binary logistic.
Cheers De: Simon Dirmeier <simon.dirme...@web.de> Fecha: martes, 24 de octubre de 2017, 08:30 Para: Alexis Peña <alexis.p...@exalitica.com>, <user@spark.apache.org> Asunto: Re: Zero Coefficient in logistic regression So, all the coefficients are the same but for CRUZADAS? How are you fitting the model in R (glm)? Can you try setting zero penalty for alpha and lambda: .setRegParam(0) .setElasticNetParam(0) Cheers, S Am 24.10.17 um 13:19 schrieb Alexis Peña: Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic must be work whit 2x2 tables. i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this kind of features has estimations CRUZADAS49070,247624087 CRUZADAS5304-0,161424508 Thanks De: Weichen Xu <weichen...@databricks.com> Fecha: martes, 24 de octubre de 2017, 07:23 Para: Alexis Peña <alexis.p...@exalitica.com> CC: "user @spark" <user@spark.apache.org> Asunto: Re: Zero Coefficient in logistic regression Yes chi-squared statistic only used in categorical features. It looks not proper here. Thanks! On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <simon.dirme...@web.de> wrote: Hey, as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones? Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method. There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient. Cheers, Simon Am 24.10.17 um 04:56 schrieb Alexis Peña: Hi Guys, We are fitting a Logistic model using the following code. val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures") val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union") val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union") val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr)) do you know why the coeff for the following features are zero estimate, is it produced in ChisqSelector or Logistic model? Thanks in advance!! CODIGOPARAMETROCOEFICIENTES_MUESTREO_BALANCEADO PROPIASCV_UM0,276866756 PROPIASCV_U3M-0,241851427 PROPIASCV_U6M-0,568312819 PROPIASCV_U12M0,134706601 PROPIASM_UM5,47E-06 PROPIASM_U3M-7,10E-06 PROPIASM_U6M1,73E-05 PROPIASM_U12M-5,41E-06 PROPIASCP_UM-0,050750105 PROPIASCP_U3M0,125483162 PROPIASCP_U6M-0,353906788 PROPIASCP_U12M0,159538155 PROPIASTUM-0,020217902 PROPIASTU3M0,002101906 PROPIASTU6M-0,005481915 PROPIASTU12M0,003443081 CRUZADAS23030 CRUZADAS39010 CRUZADAS39050 CRUZADAS39070 CRUZADAS39090 CRUZADAS41020 CRUZADAS43070 CRUZADAS45010 CRUZADAS49070,247624087 CRUZADAS5304-0,161424508 LPPROM_MESES_DIST-0,680356554 PROPIASRECENCIA-0,00289069 EXTERNASTEMP_MIN0,006488683 EXTERNASTEMP_MAX-0,013497441 EXTERNASPRECIPITACIONES-0,007607086 INTERCEPTO2,401593191