Re: Spark ML Pipeline inaccessible types

2015-03-27 Thread Xiangrui Meng
Hi Martin,

Could you attach the code snippet and the stack trace? The default
implementation of some methods uses reflection, which may be the
cause.

Best,
Xiangrui

On Wed, Mar 25, 2015 at 3:18 PM,  zapletal-mar...@email.cz wrote:
 Thanks Peter,

 I ended up doing something similar. I however consider both the approaches
 you mentioned bad practices which is why I was looking for a solution
 directly supported by the current code.

 I can work with that now, but it does not seem to be the proper solution.

 Regards,
 Martin

 -- Původní zpráva --
 Od: Peter Rudenko petro.rude...@gmail.com
 Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com
 Datum: 25. 3. 2015 13:28:38


 Předmět: Re: Spark ML Pipeline inaccessible types


 Hi Martin, here’s 2 possibilities to overcome this:

 1) Put your logic into org.apache.spark package in your project - then
 everything would be accessible.
 2) Dirty trick:

  object SparkVector extends HashingTF {
   val VectorUDT: DataType = outputDataType
 }

 then you can do like this:

  StructType(vectorTypeColumn, SparkVector.VectorUDT, false))

 Thanks,
 Peter Rudenko

 On 2015-03-25 13:14, zapletal-mar...@email.cz wrote:

 Sean,

 thanks for your response. I am familiar with NoSuchMethodException in
 general, but I think it is not the case this time. The code actually
 attempts to get parameter by name using val m =
 this.getClass.getMethodName(paramName).

 This may be a bug, but it is only a side effect caused by the real problem I
 am facing. My issue is that VectorUDT is not accessible by user code and
 therefore it is not possible to use custom ML pipeline with the existing
 Predictors (see the last two paragraphs in my first email).

 Best Regards,
 Martin

 -- Původní zpráva --
 Od: Sean Owen so...@cloudera.com
 Komu: zapletal-mar...@email.cz
 Datum: 25. 3. 2015 11:05:54
 Předmět: Re: Spark ML Pipeline inaccessible types


 NoSuchMethodError in general means that your runtime and compile-time
 environments are different. I think you need to first make sure you
 don't have mismatching versions of Spark.

 On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote:
 Hi,

 I have started implementing a machine learning pipeline using Spark 1.3.0
 and the new pipelining API and DataFrames. I got to a point where I have
 my
 training data set prepared using a sequence of Transformers, but I am
 struggling to actually train a model and use it for predictions.

 I am getting a java.lang.NoSuchMethodException:
 org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
 exception thrown at checkInputColumn method in Params trait when using a
 Predictor (LinearRegression in my case, but that should not matter). This
 looks like a bug - the exception is thrown when executing
 getParam(colName)
 when the require(actualDataType.equals(datatype), ...) requirement is not
 met so the expected requirement failed exception is not thrown and is
 hidden
 by the unexpected NoSuchMethodException instead. I can raise a bug if this
 really is an issue and I am not using something incorrectly.

 The problem I am facing however is that the Predictor expects features to
 have VectorUDT type as defined in Predictor class (protected def
 featuresDataType: DataType = new VectorUDT). But since this type is
 private[spark] my Transformer can not prepare features with this type
 which
 then correctly results in the exception above when I use a different type.

 Is there a way to define a custom Pipeline that would be able to use the
 existing Predictors without having to bypass the access modifiers or
 reimplement something or is the pipelining API not yet expected to be used
 in this way?

 Thanks,
 Martin



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark ML Pipeline inaccessible types

2015-03-27 Thread Joseph Bradley
Hi Martin,

In the short term: Would you be able to work with a different type other
than Vector?  If so, then you can override the *Predictor* class's *protected
def featuresDataType: DataType* with a DataFrame type which fits your
purpose.  If you need Vector, then you might have to do a hack like Peter
suggested.

In the long term: VectorUDT should indeed be made public, but that will
have to wait until the next release.

Thanks for the feedback,
Joseph

On Fri, Mar 27, 2015 at 11:12 AM, Xiangrui Meng men...@gmail.com wrote:

 Hi Martin,

 Could you attach the code snippet and the stack trace? The default
 implementation of some methods uses reflection, which may be the
 cause.

 Best,
 Xiangrui

 On Wed, Mar 25, 2015 at 3:18 PM,  zapletal-mar...@email.cz wrote:
  Thanks Peter,
 
  I ended up doing something similar. I however consider both the
 approaches
  you mentioned bad practices which is why I was looking for a solution
  directly supported by the current code.
 
  I can work with that now, but it does not seem to be the proper solution.
 
  Regards,
  Martin
 
  -- Původní zpráva --
  Od: Peter Rudenko petro.rude...@gmail.com
  Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com
  Datum: 25. 3. 2015 13:28:38
 
 
  Předmět: Re: Spark ML Pipeline inaccessible types
 
 
  Hi Martin, here’s 2 possibilities to overcome this:
 
  1) Put your logic into org.apache.spark package in your project - then
  everything would be accessible.
  2) Dirty trick:
 
   object SparkVector extends HashingTF {
val VectorUDT: DataType = outputDataType
  }
 
  then you can do like this:
 
   StructType(vectorTypeColumn, SparkVector.VectorUDT, false))
 
  Thanks,
  Peter Rudenko
 
  On 2015-03-25 13:14, zapletal-mar...@email.cz wrote:
 
  Sean,
 
  thanks for your response. I am familiar with NoSuchMethodException in
  general, but I think it is not the case this time. The code actually
  attempts to get parameter by name using val m =
  this.getClass.getMethodName(paramName).
 
  This may be a bug, but it is only a side effect caused by the real
 problem I
  am facing. My issue is that VectorUDT is not accessible by user code and
  therefore it is not possible to use custom ML pipeline with the existing
  Predictors (see the last two paragraphs in my first email).
 
  Best Regards,
  Martin
 
  -- Původní zpráva --
  Od: Sean Owen so...@cloudera.com
  Komu: zapletal-mar...@email.cz
  Datum: 25. 3. 2015 11:05:54
  Předmět: Re: Spark ML Pipeline inaccessible types
 
 
  NoSuchMethodError in general means that your runtime and compile-time
  environments are different. I think you need to first make sure you
  don't have mismatching versions of Spark.
 
  On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote:
  Hi,
 
  I have started implementing a machine learning pipeline using Spark
 1.3.0
  and the new pipelining API and DataFrames. I got to a point where I have
  my
  training data set prepared using a sequence of Transformers, but I am
  struggling to actually train a model and use it for predictions.
 
  I am getting a java.lang.NoSuchMethodException:
  org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
  exception thrown at checkInputColumn method in Params trait when using a
  Predictor (LinearRegression in my case, but that should not matter).
 This
  looks like a bug - the exception is thrown when executing
  getParam(colName)
  when the require(actualDataType.equals(datatype), ...) requirement is
 not
  met so the expected requirement failed exception is not thrown and is
  hidden
  by the unexpected NoSuchMethodException instead. I can raise a bug if
 this
  really is an issue and I am not using something incorrectly.
 
  The problem I am facing however is that the Predictor expects features
 to
  have VectorUDT type as defined in Predictor class (protected def
  featuresDataType: DataType = new VectorUDT). But since this type is
  private[spark] my Transformer can not prepare features with this type
  which
  then correctly results in the exception above when I use a different
 type.
 
  Is there a way to define a custom Pipeline that would be able to use the
  existing Predictors without having to bypass the access modifiers or
  reimplement something or is the pipelining API not yet expected to be
 used
  in this way?
 
  Thanks,
  Martin
 
 
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread Peter Rudenko

Hi Martin, here’s 2 possibilities to overcome this:

1) Put your logic into org.apache.spark package in your project - then 
everything would be accessible.

2) Dirty trick:

|object SparkVector extends HashingTF { val VectorUDT: DataType = 
outputDataType } |


then you can do like this:

|StructType(vectorTypeColumn, SparkVector.VectorUDT, false)) |

Thanks,
Peter Rudenko

On 2015-03-25 13:14, zapletal-mar...@email.cz wrote:


Sean,

thanks for your response. I am familiar with /NoSuchMethodException/ 
in general, but I think it is not the case this time. The code 
actually attempts to get parameter by name using /val m = 
this.getClass.getMethodName(paramName)./


This may be a bug, but it is only a side effect caused by the real 
problem I am facing. My issue is that VectorUDT is not accessible by 
user code and therefore it is not possible to use custom ML pipeline 
with the existing Predictors (see the last two paragraphs in my first 
email).


Best Regards,
Martin

-- Původní zpráva --
Od: Sean Owen so...@cloudera.com
Komu: zapletal-mar...@email.cz
Datum: 25. 3. 2015 11:05:54
Předmět: Re: Spark ML Pipeline inaccessible types


NoSuchMethodError in general means that your runtime and compile-time
environments are different. I think you need to first make sure you
don't have mismatching versions of Spark.

On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote:
 Hi,

 I have started implementing a machine learning pipeline using
Spark 1.3.0
 and the new pipelining API and DataFrames. I got to a point
where I have my
 training data set prepared using a sequence of Transformers, but
I am
 struggling to actually train a model and use it for predictions.

 I am getting a java.lang.NoSuchMethodException:

org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
 exception thrown at checkInputColumn method in Params trait when
using a
 Predictor (LinearRegression in my case, but that should not
matter). This
 looks like a bug - the exception is thrown when executing
getParam(colName)
 when the require(actualDataType.equals(datatype), ...)
requirement is not
 met so the expected requirement failed exception is not thrown
and is hidden
 by the unexpected NoSuchMethodException instead. I can raise a
bug if this
 really is an issue and I am not using something incorrectly.

 The problem I am facing however is that the Predictor expects
features to
 have VectorUDT type as defined in Predictor class (protected def
 featuresDataType: DataType = new VectorUDT). But since this type is
 private[spark] my Transformer can not prepare features with this
type which
 then correctly results in the exception above when I use a
different type.

 Is there a way to define a custom Pipeline that would be able to
use the
 existing Predictors without having to bypass the access modifiers or
 reimplement something or is the pipelining API not yet expected
to be used
 in this way?

 Thanks,
 Martin



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


​


Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
Thanks Peter,



I ended up doing something similar. I however consider both the approaches 
you mentioned bad practices which is why I was looking for a solution 
directly supported by the current code.




I can work with that now, but it does not seem to be the proper solution.




Regards,

Martin





-- Původní zpráva --
Od: Peter Rudenko petro.rude...@gmail.com
Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com
Datum: 25. 3. 2015 13:28:38
Předmět: Re: Spark ML Pipeline inaccessible types




 Hi Martin, here’s 2 possibilities to overcome this:

 1) Put your logic into org.apache.spark package in your project - then 
 everything would be accessible.
 2) Dirty trick:

 spanspanobject/span spanSparkVector/span 
spanspanextends/span/span spanHashingTF/span {/span
  spanspanval/span spanVectorUDT/span:/span spanDataType/span = 
outputDataType
}


 then you can do like this:

 spanStructType/span(spanvectorTypeColumn/span, 
spanSparkVector/span.spanVectorUDT/span, spanfalse/span))


 Thanks,
 Peter Rudenko

 On 2015-03-25 13:14, zapletal-mar...@email.cz
 (mailto:zapletal-mar...@email.cz) wrote:

 



Sean, 



thanks for your response. I am familiar with NoSuchMethodException in 
general, but I think it is not the case this time. The code actually 
attempts to get parameter by name using val m = this.getClass.getMethodName
(paramName).




This may be a bug, but it is only a side effect caused by the real problem I
am facing. My issue is that VectorUDT is not accessible by user code and 
therefore it is not possible to use custom ML pipeline with the existing 
Predictors (see the last two paragraphs in my first email).




Best Regards,

Martin



-- Původní zpráva --
Od: Sean Owen so...@cloudera.com(mailto:so...@cloudera.com)
Komu: zapletal-mar...@email.cz(mailto:zapletal-mar...@email.cz)
Datum: 25. 3. 2015 11:05:54
Předmět: Re: Spark ML Pipeline inaccessible types

NoSuchMethodError in general means that your runtime and compile-time
environments are different. I think you need to first make sure you
don't have mismatching versions of Spark.

On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz
(mailto:zapletal-mar...@email.cz) wrote:
 Hi,

 I have started implementing a machine learning pipeline using Spark 1.3.0
 and the new pipelining API and DataFrames. I got to a point where I have 
my
 training data set prepared using a sequence of Transformers, but I am
 struggling to actually train a model and use it for predictions.

 I am getting a java.lang.NoSuchMethodException:
 org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
 exception thrown at checkInputColumn method in Params trait when using a
 Predictor (LinearRegression in my case, but that should not matter). This
 looks like a bug - the exception is thrown when executing getParam
(colName)
 when the require(actualDataType.equals(datatype), ...) requirement is not
 met so the expected requirement failed exception is not thrown and is 
hidden
 by the unexpected NoSuchMethodException instead. I can raise a bug if this
 really is an issue and I am not using something incorrectly.

 The problem I am facing however is that the Predictor expects features to
 have VectorUDT type as defined in Predictor class (protected def
 featuresDataType: DataType = new VectorUDT). But since this type is
 private[spark] my Transformer can not prepare features with this type 
which
 then correctly results in the exception above when I use a different type.

 Is there a way to define a custom Pipeline that would be able to use the
 existing Predictors without having to bypass the access modifiers or
 reimplement something or is the pipelining API not yet expected to be used
 in this way?

 Thanks,
 Martin



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
(mailto:user-unsubscr...@spark.apache.org)
For additional commands, e-mail: user-h...@spark.apache.org
(mailto:user-h...@spark.apache.org) 
 



 

​





Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
Sean,



thanks for your response. I am familiar with NoSuchMethodException in 
general, but I think it is not the case this time. The code actually 
attempts to get parameter by name using val m = this.getClass.getMethodName
(paramName).




This may be a bug, but it is only a side effect caused by the real problem I
am facing. My issue is that VectorUDT is not accessible by user code and 
therefore it is not possible to use custom ML pipeline with the existing 
Predictors (see the last two paragraphs in my first email).




Best Regards,

Martin



-- Původní zpráva --
Od: Sean Owen so...@cloudera.com
Komu: zapletal-mar...@email.cz
Datum: 25. 3. 2015 11:05:54
Předmět: Re: Spark ML Pipeline inaccessible types

NoSuchMethodError in general means that your runtime and compile-time
environments are different. I think you need to first make sure you
don't have mismatching versions of Spark.

On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote:
 Hi,

 I have started implementing a machine learning pipeline using Spark 1.3.0
 and the new pipelining API and DataFrames. I got to a point where I have 
my
 training data set prepared using a sequence of Transformers, but I am
 struggling to actually train a model and use it for predictions.

 I am getting a java.lang.NoSuchMethodException:
 org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
 exception thrown at checkInputColumn method in Params trait when using a
 Predictor (LinearRegression in my case, but that should not matter). This
 looks like a bug - the exception is thrown when executing getParam
(colName)
 when the require(actualDataType.equals(datatype), ...) requirement is not
 met so the expected requirement failed exception is not thrown and is 
hidden
 by the unexpected NoSuchMethodException instead. I can raise a bug if this
 really is an issue and I am not using something incorrectly.

 The problem I am facing however is that the Predictor expects features to
 have VectorUDT type as defined in Predictor class (protected def
 featuresDataType: DataType = new VectorUDT). But since this type is
 private[spark] my Transformer can not prepare features with this type 
which
 then correctly results in the exception above when I use a different type.

 Is there a way to define a custom Pipeline that would be able to use the
 existing Predictors without having to bypass the access modifiers or
 reimplement something or is the pipelining API not yet expected to be used
 in this way?

 Thanks,
 Martin



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread Sean Owen
NoSuchMethodError in general means that your runtime and compile-time
environments are different. I think you need to first make sure you
don't have mismatching versions of Spark.

On Wed, Mar 25, 2015 at 11:00 AM,  zapletal-mar...@email.cz wrote:
 Hi,

 I have started implementing a machine learning pipeline using Spark 1.3.0
 and the new pipelining API and DataFrames. I got to a point where I have my
 training data set prepared using a sequence of Transformers, but I am
 struggling to actually train a model and use it for predictions.

 I am getting a java.lang.NoSuchMethodException:
 org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
 exception thrown at checkInputColumn method in Params trait when using a
 Predictor (LinearRegression in my case, but that should not matter). This
 looks like a bug - the exception is thrown when executing getParam(colName)
 when the require(actualDataType.equals(datatype), ...) requirement is not
 met so the expected requirement failed exception is not thrown and is hidden
 by the unexpected NoSuchMethodException instead. I can raise a bug if this
 really is an issue and I am not using something incorrectly.

 The problem I am facing however is that the Predictor expects features to
 have VectorUDT type as defined in Predictor class (protected def
 featuresDataType: DataType = new VectorUDT). But since this type is
 private[spark] my Transformer can not prepare features with this type which
 then correctly results in the exception above when I use a different type.

 Is there a way to define a custom Pipeline that would be able to use the
 existing Predictors without having to bypass the access modifiers or
 reimplement something or is the pipelining API not yet expected to be used
 in this way?

 Thanks,
 Martin



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org