Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-21 Thread Sean Owen
On Fri, Nov 20, 2015 at 10:39 PM, Reynold Xin  wrote:
> I don't think we should look at it from only maintenance point of view --
> because in that case the answer is clearly supporting as few versions as
> possible (or just rm -rf spark source code and call it a day). It is a
> tradeoff between the number of users impacted and the maintenance burden.

The upside to supporting only newer versions is less maintenance (no
small thing given how sprawling the build is), but also more ability
to use newer functionality. The downside is of course not letting
older Hadoop users use the latest Spark.


> 1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3?

If the question is about HDFS, really, then I think the answer is
"yes". The big compatibility problem has been protobuf but all of 2.2+
is on 2.5.


> 3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?

Same client/server question? This is where I'm not as clear. I think
the answer is 'yes' to the extent you're using functionality that
existed in the older YARN. Of course, using some newer API vs old
clusters doesn't work.


> 4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below stop?
> To what extent do you care about running Spark on older Hadoop clusters.

CDH 5.3 = Hadoop 2.6, FWIW, which was out about a year ago. Support
continues for a long time in the sense that CDH 5 will be supported
for years. However, Spark 2 would never be shipped / supported in CDH
5. So, it's not an issue for Spark 2; Spark 2 will be "supported"
probably only vs Hadoop 3 or at least something later in 2.x than 2.6.

The question is here is really about whether Spark should specially
support, say, Spark 2 + CDH 5.0 or something. My experience so far is
that Spark has not really supported older vendor versions it claims
to, and I'd rather not pretend it does. So this doesn't strike me as a
great reason either.

This is roughly why supporting, say, 2.6 as a pretty safely recent
version seems like an OK place to draw the line 6-8 months from now.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-21 Thread Steve Loughran

> On 20 Nov 2015, at 21:39, Reynold Xin  wrote:
> 
> OK I'm not exactly asking for a vote here :)
> 
> I don't think we should look at it from only maintenance point of view -- 
> because in that case the answer is clearly supporting as few versions as 
> possible (or just rm -rf spark source code and call it a day). It is a 
> tradeoff between the number of users impacted and the maintenance burden.
> 
> So a few questions for those more familiar with Hadoop:
> 
> 1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3? 
> 

yes, at HDFS 

There's some special cases with HDFS stopping a 2.2-2.5 client talking to 
Hadoop 2.6


-HDFS at rest encryption needs a client that can decode it (2.6.x+)
-HDFS erasure code will need a later version (2.8?)

If you turn SASL on in your datanodes, your DNs don't need to come up on a port 
< 1024, but Hadoop  < 2.6 clients stop being able to work with HDFS at that 
point



> 2. If the answer to 1 is yes, are there known, major issues with backward 
> compatibility?
> 

hadoop native libs, every time. Guava, jackson and protobuf can be managed with 
shading, but hadoop.{so,dll} is a real problem. A hadoop-2.6 JAR will use 
native methods in hadoop.lib which, if not loaded, will break the app.  This is 
a pain as nobody includes that native lib with their java binaries —who can 
even predict which one they have to do. As a consequence, I'd really advise 
against trying to run an app built with the 2.6 JARS inside a YARN cluster  < 
2.6. You can certainly talk to HDFS and the YARN services, but there's a risk a 
codepath will hit a native method that isn't there.


It's trouble the other way too.  -even though we try not break existing code by 
moving/renaming native methods it can happen.

The last time someone did this in a big way, I was the first to find it in 
HADOOP-11064; the changes where reverted/altered but there was no official 
declaration that compatibility at the JNI layer will be maintained. Apparently 
you can't guarantee it over JVM versions either.

We really need a lib versioning story, which is what HADOOP-11127 covers.

> 3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?
> 

I'd say no, with classpath and hadoop native being the failure points.

There's also feature completeness; Hadoop 2.6 was the first version with all 
the YARN-896 work for long-lived services


> 4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below stop? 
> To what extent do you care about running Spark on older Hadoop clusters.
> 
> 

I don't know. And I probably don't want to make any forward looking statements 
anyway. But I don't even know how well supported 2.4 is today; 2.6 is the one 
that still gets bug fixes out from the ASF. I can see it lasting a while.


What essentially happens is that we provide bug fixes to the existing releases, 
but for anything new: upgrade.

Assuming that policy continues (disclaimer: personal opinions, etc), then any 
Spark 2.0 release would be rebuilt against all the JARs which the rest of that 
version of HDP would use, and that's the only version we'd recommend using.



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Using spark MLlib without installing Spark

2015-11-21 Thread Rad Gruchalski
Bowen,  

One project to look at could be spark-notebook: 
https://github.com/andypetrella/spark-notebook
It uses Spark you in the way you intend to use it.










Kind regards,

Radek Gruchalski

ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
(mailto:ra...@gruchalski.com)
de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



On Sunday, 22 November 2015 at 00:38, bowen zhang wrote:

> Hi folks,
> I am a big fan of Spark's Mllib package. I have a java web app where I want 
> to run some ml jobs inside the web app. My question is: is there a way to 
> just import spark-core and spark-mllib jars to invoke my ML jobs without 
> installing the entire Spark package? All the tutorials related Spark seems to 
> indicate installing Spark is a pre-condition for this.
>  
> Thanks,
> Bowen



Re: Using spark MLlib without installing Spark

2015-11-21 Thread bowen zhang
Thanks Rad for info. I looked into the repo and see some .snb file using spark 
mllib. Can you give me a more specific place to look for when invoking the 
mllib functions? What if I just want to invoke some of the ML functions in my 
HelloWorld.java?

  From: Rad Gruchalski 
 To: bowen zhang  
Cc: "dev@spark.apache.org" 
 Sent: Saturday, November 21, 2015 3:43 PM
 Subject: Re: Using spark MLlib without installing Spark
   
Bowen, 
One project to look at could be spark-notebook: 
https://github.com/andypetrella/spark-notebookIt uses Spark you in the way you 
intend to use it.Kindregards,

RadekGruchalski

ra...@gruchalski.com

de.linkedin.com/in/radgruchalski/

Confidentiality:
Thiscommunication is intended for the above-named person and may beconfidential 
and/or legally privileged.
If it has come to you inerror you must take no action based on it, nor must you 
copy or showit to anyone; please delete/destroy and inform the 
senderimmediately. 

On Sunday, 22 November 2015 at 00:38, bowen zhang wrote: 
 Hi folks,I am a big fan of Spark's Mllib package. I have a java web app where 
I want to run some ml jobs inside the web app. My question is: is there a way 
to just import spark-core and spark-mllib jars to invoke my ML jobs without 
installing the entire Spark package? All the tutorials related Spark seems to 
indicate installing Spark is a pre-condition for this.
Thanks,Bowen
 
  
 

  

Re: Using spark MLlib without installing Spark

2015-11-21 Thread Rad Gruchalski
Bowen,  

What Andy is doing in the notebook is a slightly different thing. He’s using 
sbt to bring all spark jars (core, mllib, repl, what have you). You could use 
maven for that. He then creates a repl and submits all the spark code into it.
Pretty sure spark unit tests cover similar uses cases. Maybe not mllib per se 
but this kind of submission.










Kind regards,

Radek Gruchalski

ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
(mailto:ra...@gruchalski.com)
de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



On Sunday, 22 November 2015 at 01:01, bowen zhang wrote:

> Thanks Rad for info. I looked into the repo and see some .snb file using 
> spark mllib. Can you give me a more specific place to look for when invoking 
> the mllib functions? What if I just want to invoke some of the ML functions 
> in my HelloWorld.java?
>  
> From: Rad Gruchalski 
> To: bowen zhang   
> Cc: "dev@spark.apache.org (mailto:dev@spark.apache.org)" 
> 
> Sent: Saturday, November 21, 2015 3:43 PM
> Subject: Re: Using spark MLlib without installing Spark
>  
> Bowen,  
>  
> One project to look at could be spark-notebook: 
> https://github.com/andypetrella/spark-notebook
> It uses Spark you in the way you intend to use it.
>  
>  
>  
> Kind regards,

> Radek Gruchalski
> 
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
> (mailto:ra...@gruchalski.com)
> de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)
>  
> Confidentiality:
> This communication is intended for the above-named person and may be 
> confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor must 
> you copy or show it to anyone; please delete/destroy and inform the sender 
> immediately.  
>  
>  
> On Sunday, 22 November 2015 at 00:38, bowen zhang wrote:
> > Hi folks,
> > I am a big fan of Spark's Mllib package. I have a java web app where I want 
> > to run some ml jobs inside the web app. My question is: is there a way to 
> > just import spark-core and spark-mllib jars to invoke my ML jobs without 
> > installing the entire Spark package? All the tutorials related Spark seems 
> > to indicate installing Spark is a pre-condition for this.
> >  
> > Thanks,
> > Bowen
>  
>  
>  



Using spark MLlib without installing Spark

2015-11-21 Thread bowen zhang
Hi folks,I am a big fan of Spark's Mllib package. I have a java web app where I 
want to run some ml jobs inside the web app. My question is: is there a way to 
just import spark-core and spark-mllib jars to invoke my ML jobs without 
installing the entire Spark package? All the tutorials related Spark seems to 
indicate installing Spark is a pre-condition for this.
Thanks,Bowen


Re: Using spark MLlib without installing Spark

2015-11-21 Thread Reynold Xin
You can use MLlib and Spark directly without "installing anything". Just
run Spark in local mode.


On Sat, Nov 21, 2015 at 4:05 PM, Rad Gruchalski 
wrote:

> Bowen,
>
> What Andy is doing in the notebook is a slightly different thing. He’s
> using sbt to bring all spark jars (core, mllib, repl, what have you). You
> could use maven for that. He then creates a repl and submits all the spark
> code into it.
> Pretty sure spark unit tests cover similar uses cases. Maybe not mllib per
> se but this kind of submission.
>
> Kind regards,
> Radek Gruchalski
> ra...@gruchalski.com 
> de.linkedin.com/in/radgruchalski/
>
>
> *Confidentiality:*This communication is intended for the above-named
> person and may be confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor
> must you copy or show it to anyone; please delete/destroy and inform the
> sender immediately.
>
> On Sunday, 22 November 2015 at 01:01, bowen zhang wrote:
>
> Thanks Rad for info. I looked into the repo and see some .snb file using
> spark mllib. Can you give me a more specific place to look for when
> invoking the mllib functions? What if I just want to invoke some of the ML
> functions in my HelloWorld.java?
>
> --
> *From:* Rad Gruchalski 
> *To:* bowen zhang 
> *Cc:* "dev@spark.apache.org" 
> *Sent:* Saturday, November 21, 2015 3:43 PM
> *Subject:* Re: Using spark MLlib without installing Spark
>
> Bowen,
>
> One project to look at could be spark-notebook:
> https://github.com/andypetrella/spark-notebook
> It uses Spark you in the way you intend to use it.
> Kind regards,
> Radek Gruchalski
> ra...@gruchalski.com 
> de.linkedin.com/in/radgruchalski/
>
>
> *Confidentiality:*This communication is intended for the above-named
> person and may be confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor
> must you copy or show it to anyone; please delete/destroy and inform the
> sender immediately.
>
>
> On Sunday, 22 November 2015 at 00:38, bowen zhang wrote:
>
> Hi folks,
> I am a big fan of Spark's Mllib package. I have a java web app where I
> want to run some ml jobs inside the web app. My question is: is there a way
> to just import spark-core and spark-mllib jars to invoke my ML jobs without
> installing the entire Spark package? All the tutorials related Spark seems
> to indicate installing Spark is a pre-condition for this.
>
> Thanks,
> Bowen
>
>
>
>
>
>


Re: Unhandled case in VectorAssembler

2015-11-21 Thread Benjamin Fradet
Will do, thanks for your input.
On 21 Nov 2015 2:42 a.m., "Joseph Bradley"  wrote:

> Yes, please, could you send a JIRA (and PR)?  A custom error message would
> be better.
> Thank you!
> Joseph
>
> On Fri, Nov 20, 2015 at 2:39 PM, BenFradet 
> wrote:
>
>> Hey there,
>>
>> I noticed that there is an unhandled case in the transform method of
>> VectorAssembler if one of the input columns doesn't have one of the
>> supported type DoubleType, NumericType, BooleanType or VectorUDT.
>>
>> So, if you try to transform a column of StringType you get a cryptic
>> "scala.MatchError: StringType".
>> I was wondering if we shouldn't throw a custom exception indicating that
>> this is not a supported type.
>>
>> I can submit a jira and pr if needed.
>>
>> Best regards,
>> Ben.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Unhandled-case-in-VectorAssembler-tp15302.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>