Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Joseph Bradley
There have been some comments about using Pipelines outside of ML, but I
have not yet seen a real need for it.  If a user does want to use Pipelines
for non-ML tasks, they still can use Transformers + PipelineModels.  Will
that work?

On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski  wrote:

> Hi,
>
> After few weeks with spark.ml now, I came to conclusion that
> Transformer concept from Pipeline API (spark.ml/MLlib) should be part
> of DataFrame (SQL) where they fit better. Are there any plans to
> migrate Transformer API (ML) to DataFrame (SQL)?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Koert Kuipers
i asked around a little, and the general trend at our clients seems to be
that they plan to upgrade the clusters to java 8 within the year.

so with that in mind i wish this was a little later (i would have preferred
a java-8-only spark at the end of year). but since a major spark version
only comes around every so often, i guess it makes sense to make the jump
now. so:
+1 on dropping java 7
+1 on dropping scala 2.10

i would especially like to point out (as others have before me) that nobody
has come in and said they actually need scala 2.10 support, so that seems
like the easiest choice to me of all.

On Fri, Mar 25, 2016 at 10:03 AM, Andrew Ray  wrote:

> +1 on removing Java 7 and Scala 2.10 support.
>
> It looks to be entirely possible to support Java 8 containers in a YARN
> cluster otherwise running Java 7 (example code for alt JAVA_HOME
> https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch)
> so really there should be no big problem. Even if that somehow doesn't work
> I'm still +1 as the benefits are so large.
>
> I'd also like to point out that it is completely trivial to have multiple
> versions of Spark running concurrently on a YARN cluster. At my previous
> (extremely large) employer we had almost every release since 1.0 installed,
> with the latest being default and production apps pinned to a specific
> version. So if you want to keep using some Scala 2.10 only library or just
> don't want to migrate to Java 8, feel free to continue using Spark 1.x for
> those applications.
>
> IMHO we need to move on from EOL stuff to make room for the future (Java
> 9, Scala 2.12) and Spark 2.0 is the only chance we are going to have to do
> so for a long time.
>
> --Andrew
>
> On Thu, Mar 24, 2016 at 10:55 PM, Mridul Muralidharan 
> wrote:
>
>>
>> I do agree w.r.t scala 2.10 as well; similar arguments apply (though
>> there is a nuanced diff - source compatibility for scala vs binary
>> compatibility wrt Java)
>> Was there a proposal which did not go through ? Not sure if I missed it.
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Koert Kuipers  wrote:
>>
>>> i think that logic is reasonable, but then the same should also apply to
>>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>>> has been since march 2015 except for one hotfix due to a license
>>> incompatibility)
>>>
>>> who wants to support scala 2.10 three years after they did the last
>>> maintenance release?
>>>
>>>
>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan 
>>> wrote:
>>>
 Removing compatibility (with jdk, etc) can be done with a major
 release- given that 7 has been EOLed a while back and is now unsupported,
 we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
 now).

 Given the functionality & performance benefits of going to jdk8, future
 enhancements relevant in 2.x timeframe ( scala, dependencies) which
 requires it, and simplicity wrt code, test & support it looks like a good
 checkpoint to drop jdk7 support.

 As already mentioned in the thread, existing yarn clusters are
 unaffected if they want to continue running jdk7 and yet use
 spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
 distribute jdk8 as archive - suboptimal).
 I am unsure about mesos (standalone might be easier upgrade I guess ?).


 Proposal is for 1.6x line to continue to be supported with critical
 fixes; newer features will require 2.x and so jdk8

 Regards
 Mridul


 On Thursday, March 24, 2016, Marcelo Vanzin 
 wrote:

> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin 
> wrote:
> > If you want to go down that route, you should also ask somebody who
> has had
> > experience managing a large organization's applications and try to
> update
> > Scala version.
>
> I understand both sides. But if you look at what I've been asking
> since the beginning, it's all about the cost and benefits of dropping
> support for java 1.7.
>
> The biggest argument in your original e-mail is about testing. And the
> testing cost is much bigger for supporting scala 2.10 than it is for
> supporting java 1.7. If you read one of my earlier replies, it should
> be even possible to just do everything in a single job - compile for
> java 7 and still be able to test things in 1.8, including lambdas,
> which seems to be the main thing you were worried about.
>
>
> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin 
> wrote:
> >>
> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin 
> wrote:
> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
> than
> >> > upgrading the JVM runtime from 7 

Re: SPARK-13843 and future of streaming backends

2016-03-25 Thread David Nalley

> As far as group / artifact name compatibility, at least in the case of
> Kafka we need different artifact names anyway, and people are going to
> have to make changes to their build files for spark 2.0 anyway.   As
> far as keeping the actual classes in org.apache.spark to not break
> code despite the group name being different, I don't know whether that
> would be enforced by maven central, just looked at as poor taste, or
> ASF suing for trademark violation :)


Sonatype, has strict instructions to only permit org.apache.* to originate from 
repository.apache.org. Exceptions to that must be approved by VP, 
Infrastructure. 
--
Sent via Pony Mail for dev@spark.apache.org. 
View this email online at:
https://pony-poc.apache.org/list.html?dev@spark.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[spark.ml] Why is private class ColumnPruner?

2016-03-25 Thread Jacek Laskowski
Hi,

Came across `private class ColumnPruner` with "TODO(ekl) make this a
public transformer" in scaladoc, cf.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala#L317.

Why is this private and is there a JIRA for the TODO(ekl)?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Jacek Laskowski
Hi,

After few weeks with spark.ml now, I came to conclusion that
Transformer concept from Pipeline API (spark.ml/MLlib) should be part
of DataFrame (SQL) where they fit better. Are there any plans to
migrate Transformer API (ML) to DataFrame (SQL)?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Andrew Ray
+1 on removing Java 7 and Scala 2.10 support.

It looks to be entirely possible to support Java 8 containers in a YARN
cluster otherwise running Java 7 (example code for alt JAVA_HOME
https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch)
so really there should be no big problem. Even if that somehow doesn't work
I'm still +1 as the benefits are so large.

I'd also like to point out that it is completely trivial to have multiple
versions of Spark running concurrently on a YARN cluster. At my previous
(extremely large) employer we had almost every release since 1.0 installed,
with the latest being default and production apps pinned to a specific
version. So if you want to keep using some Scala 2.10 only library or just
don't want to migrate to Java 8, feel free to continue using Spark 1.x for
those applications.

IMHO we need to move on from EOL stuff to make room for the future (Java 9,
Scala 2.12) and Spark 2.0 is the only chance we are going to have to do so
for a long time.

--Andrew

On Thu, Mar 24, 2016 at 10:55 PM, Mridul Muralidharan 
wrote:

>
> I do agree w.r.t scala 2.10 as well; similar arguments apply (though there
> is a nuanced diff - source compatibility for scala vs binary compatibility
> wrt Java)
> Was there a proposal which did not go through ? Not sure if I missed it.
>
> Regards
> Mridul
>
>
> On Thursday, March 24, 2016, Koert Kuipers  wrote:
>
>> i think that logic is reasonable, but then the same should also apply to
>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>> has been since march 2015 except for one hotfix due to a license
>> incompatibility)
>>
>> who wants to support scala 2.10 three years after they did the last
>> maintenance release?
>>
>>
>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan 
>> wrote:
>>
>>> Removing compatibility (with jdk, etc) can be done with a major release-
>>> given that 7 has been EOLed a while back and is now unsupported, we have to
>>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>
>>> Given the functionality & performance benefits of going to jdk8, future
>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>> requires it, and simplicity wrt code, test & support it looks like a good
>>> checkpoint to drop jdk7 support.
>>>
>>> As already mentioned in the thread, existing yarn clusters are
>>> unaffected if they want to continue running jdk7 and yet use
>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>> distribute jdk8 as archive - suboptimal).
>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>
>>>
>>> Proposal is for 1.6x line to continue to be supported with critical
>>> fixes; newer features will require 2.x and so jdk8
>>>
>>> Regards
>>> Mridul
>>>
>>>
>>> On Thursday, March 24, 2016, Marcelo Vanzin  wrote:
>>>
 On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin 
 wrote:
 > If you want to go down that route, you should also ask somebody who
 has had
 > experience managing a large organization's applications and try to
 update
 > Scala version.

 I understand both sides. But if you look at what I've been asking
 since the beginning, it's all about the cost and benefits of dropping
 support for java 1.7.

 The biggest argument in your original e-mail is about testing. And the
 testing cost is much bigger for supporting scala 2.10 than it is for
 supporting java 1.7. If you read one of my earlier replies, it should
 be even possible to just do everything in a single job - compile for
 java 7 and still be able to test things in 1.8, including lambdas,
 which seems to be the main thing you were worried about.


 > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin 
 wrote:
 >>
 >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin 
 wrote:
 >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
 >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
 are
 >> > not
 >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
 >> > certain
 >> > esoteric cases.
 >>
 >> True, but ask anyone who manages a large cluster how long it would
 >> take them to upgrade the jdk across their cluster and validate all
 >> their applications and everything... binary compatibility is a tiny
 >> drop in that bucket.
 >>
 >> --
 >> Marcelo
 >
 >



 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


>>


Re: Does SparkSql has official jdbc/odbc driver ?

2016-03-25 Thread Daniel Darabos
I haven't tried this, but I thought you can run the Thriftserver in Spark
and then connect with the HiveServer2 JDBC driver:
http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#running-the-thrift-jdbcodbc-server

On Fri, Mar 25, 2016 at 7:57 AM, Reynold Xin  wrote:

> No - it is too painful to develop a jdbc/odbc driver.
>
>
> On Thu, Mar 24, 2016 at 11:56 PM, sage  wrote:
>
>> Hi all,
>>Does SparkSql has official jdbc/odbc driver?
>>I only found third-party's odbc/jdbc driver, like simba, and most of
>> third-party's odbc/jdbc driver are not free to use.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Does-SparkSql-has-official-jdbc-odbc-driver-tp16857.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Mridul Muralidharan
I do agree w.r.t scala 2.10 as well; similar arguments apply (though there
is a nuanced diff - source compatibility for scala vs binary compatibility
wrt Java)
Was there a proposal which did not go through ? Not sure if I missed it.

Regards
Mridul

On Thursday, March 24, 2016, Koert Kuipers  wrote:

> i think that logic is reasonable, but then the same should also apply to
> scala 2.10, which is also unmaintained/unsupported at this point (basically
> has been since march 2015 except for one hotfix due to a license
> incompatibility)
>
> who wants to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan  > wrote:
>
>> Removing compatibility (with jdk, etc) can be done with a major release-
>> given that 7 has been EOLed a while back and is now unsupported, we have to
>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>
>> Given the functionality & performance benefits of going to jdk8, future
>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires it, and simplicity wrt code, test & support it looks like a good
>> checkpoint to drop jdk7 support.
>>
>> As already mentioned in the thread, existing yarn clusters are unaffected
>> if they want to continue running jdk7 and yet use spark2 (install jdk8 on
>> all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>> archive - suboptimal).
>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>
>>
>> Proposal is for 1.6x line to continue to be supported with critical
>> fixes; newer features will require 2.x and so jdk8
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Marcelo Vanzin > > wrote:
>>
>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin 
>>> wrote:
>>> > If you want to go down that route, you should also ask somebody who
>>> has had
>>> > experience managing a large organization's applications and try to
>>> update
>>> > Scala version.
>>>
>>> I understand both sides. But if you look at what I've been asking
>>> since the beginning, it's all about the cost and benefits of dropping
>>> support for java 1.7.
>>>
>>> The biggest argument in your original e-mail is about testing. And the
>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>> supporting java 1.7. If you read one of my earlier replies, it should
>>> be even possible to just do everything in a single job - compile for
>>> java 7 and still be able to test things in 1.8, including lambdas,
>>> which seems to be the main thing you were worried about.
>>>
>>>
>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin 
>>> wrote:
>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>> are
>>> >> > not
>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>> >> > certain
>>> >> > esoteric cases.
>>> >>
>>> >> True, but ask anyone who manages a large cluster how long it would
>>> >> take them to upgrade the jdk across their cluster and validate all
>>> >> their applications and everything... binary compatibility is a tiny
>>> >> drop in that bucket.
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>


Re: Does SparkSql has official jdbc/odbc driver ?

2016-03-25 Thread Reynold Xin
No - it is too painful to develop a jdbc/odbc driver.


On Thu, Mar 24, 2016 at 11:56 PM, sage  wrote:

> Hi all,
>Does SparkSql has official jdbc/odbc driver?
>I only found third-party's odbc/jdbc driver, like simba, and most of
> third-party's odbc/jdbc driver are not free to use.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Does-SparkSql-has-official-jdbc-odbc-driver-tp16857.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Does SparkSql has official jdbc/odbc driver ?

2016-03-25 Thread sage
Hi all,
   Does SparkSql has official jdbc/odbc driver? 
   I only found third-party's odbc/jdbc driver, like simba, and most of
third-party's odbc/jdbc driver are not free to use.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Does-SparkSql-has-official-jdbc-odbc-driver-tp16857.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org