Re: Spark ML Pipeline inaccessible types
Hi Martin, Could you attach the code snippet and the stack trace? The default implementation of some methods uses reflection, which may be the cause. Best, Xiangrui On Wed, Mar 25, 2015 at 3:18 PM, zapletal-mar...@email.cz wrote: Thanks Peter, I ended up doing something similar. I however consider both the approaches you mentioned bad practices which is why I was looking for a solution directly supported by the current code. I can work with that now, but it does not seem to be the proper solution. Regards, Martin -- Původní zpráva -- Od: Peter Rudenko petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything would be accessible. 2) Dirty trick: object SparkVector extends HashingTF { val VectorUDT: DataType = outputDataType } then you can do like this: StructType(vectorTypeColumn, SparkVector.VectorUDT, false)) Thanks, Peter Rudenko On 2015-03-25 13:14, zapletal-mar...@email.cz wrote: Sean, thanks for your response. I am familiar with NoSuchMethodException in general, but I think it is not the case this time. The code actually attempts to get parameter by name using val m = this.getClass.getMethodName(paramName). This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od: Sean Owen so...@cloudera.com Komu: zapletal-mar...@email.cz Datum: 25. 3. 2015 11:05:54 Předmět: Re: Spark ML Pipeline inaccessible types NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam(colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark ML Pipeline inaccessible types
Hi Martin, In the short term: Would you be able to work with a different type other than Vector? If so, then you can override the *Predictor* class's *protected def featuresDataType: DataType* with a DataFrame type which fits your purpose. If you need Vector, then you might have to do a hack like Peter suggested. In the long term: VectorUDT should indeed be made public, but that will have to wait until the next release. Thanks for the feedback, Joseph On Fri, Mar 27, 2015 at 11:12 AM, Xiangrui Meng men...@gmail.com wrote: Hi Martin, Could you attach the code snippet and the stack trace? The default implementation of some methods uses reflection, which may be the cause. Best, Xiangrui On Wed, Mar 25, 2015 at 3:18 PM, zapletal-mar...@email.cz wrote: Thanks Peter, I ended up doing something similar. I however consider both the approaches you mentioned bad practices which is why I was looking for a solution directly supported by the current code. I can work with that now, but it does not seem to be the proper solution. Regards, Martin -- Původní zpráva -- Od: Peter Rudenko petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything would be accessible. 2) Dirty trick: object SparkVector extends HashingTF { val VectorUDT: DataType = outputDataType } then you can do like this: StructType(vectorTypeColumn, SparkVector.VectorUDT, false)) Thanks, Peter Rudenko On 2015-03-25 13:14, zapletal-mar...@email.cz wrote: Sean, thanks for your response. I am familiar with NoSuchMethodException in general, but I think it is not the case this time. The code actually attempts to get parameter by name using val m = this.getClass.getMethodName(paramName). This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od: Sean Owen so...@cloudera.com Komu: zapletal-mar...@email.cz Datum: 25. 3. 2015 11:05:54 Předmět: Re: Spark ML Pipeline inaccessible types NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam(colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark ML Pipeline inaccessible types
Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything would be accessible. 2) Dirty trick: |object SparkVector extends HashingTF { val VectorUDT: DataType = outputDataType } | then you can do like this: |StructType(vectorTypeColumn, SparkVector.VectorUDT, false)) | Thanks, Peter Rudenko On 2015-03-25 13:14, zapletal-mar...@email.cz wrote: Sean, thanks for your response. I am familiar with /NoSuchMethodException/ in general, but I think it is not the case this time. The code actually attempts to get parameter by name using /val m = this.getClass.getMethodName(paramName)./ This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od: Sean Owen so...@cloudera.com Komu: zapletal-mar...@email.cz Datum: 25. 3. 2015 11:05:54 Předmět: Re: Spark ML Pipeline inaccessible types NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam(colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark ML Pipeline inaccessible types
Thanks Peter, I ended up doing something similar. I however consider both the approaches you mentioned bad practices which is why I was looking for a solution directly supported by the current code. I can work with that now, but it does not seem to be the proper solution. Regards, Martin -- Původní zpráva -- Od: Peter Rudenko petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything would be accessible. 2) Dirty trick: spanspanobject/span spanSparkVector/span spanspanextends/span/span spanHashingTF/span {/span spanspanval/span spanVectorUDT/span:/span spanDataType/span = outputDataType } then you can do like this: spanStructType/span(spanvectorTypeColumn/span, spanSparkVector/span.spanVectorUDT/span, spanfalse/span)) Thanks, Peter Rudenko On 2015-03-25 13:14, zapletal-mar...@email.cz (mailto:zapletal-mar...@email.cz) wrote: Sean, thanks for your response. I am familiar with NoSuchMethodException in general, but I think it is not the case this time. The code actually attempts to get parameter by name using val m = this.getClass.getMethodName (paramName). This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od: Sean Owen so...@cloudera.com(mailto:so...@cloudera.com) Komu: zapletal-mar...@email.cz(mailto:zapletal-mar...@email.cz) Datum: 25. 3. 2015 11:05:54 Předmět: Re: Spark ML Pipeline inaccessible types NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz (mailto:zapletal-mar...@email.cz) wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam (colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org (mailto:user-unsubscr...@spark.apache.org) For additional commands, e-mail: user-h...@spark.apache.org (mailto:user-h...@spark.apache.org)
Re: Spark ML Pipeline inaccessible types
Sean, thanks for your response. I am familiar with NoSuchMethodException in general, but I think it is not the case this time. The code actually attempts to get parameter by name using val m = this.getClass.getMethodName (paramName). This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od: Sean Owen so...@cloudera.com Komu: zapletal-mar...@email.cz Datum: 25. 3. 2015 11:05:54 Předmět: Re: Spark ML Pipeline inaccessible types NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam (colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark ML Pipeline inaccessible types
NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions. I am getting a java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() exception thrown at checkInputColumn method in Params trait when using a Predictor (LinearRegression in my case, but that should not matter). This looks like a bug - the exception is thrown when executing getParam(colName) when the require(actualDataType.equals(datatype), ...) requirement is not met so the expected requirement failed exception is not thrown and is hidden by the unexpected NoSuchMethodException instead. I can raise a bug if this really is an issue and I am not using something incorrectly. The problem I am facing however is that the Predictor expects features to have VectorUDT type as defined in Predictor class (protected def featuresDataType: DataType = new VectorUDT). But since this type is private[spark] my Transformer can not prepare features with this type which then correctly results in the exception above when I use a different type. Is there a way to define a custom Pipeline that would be able to use the existing Predictors without having to bypass the access modifiers or reimplement something or is the pipelining API not yet expected to be used in this way? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org