Re: renaming "minor release" to "feature release"

2016-07-28 Thread vaquar khan
+1
Though following is commonly use standard for
release(http://semver.org/) ,feature
also looks good as Minor release indicate significant features have been
added

   1. MAJOR version when you make incompatible API changes,
   2. MINOR version when you add functionality in a backwards-compatible
   manner, and
   3. PATCH version when you make backwards-compatible bug fixes.


Apart from verbiage "Minor" with "feature"  no other changes in  versioning
policy.

regards,
Vaquar khan

On Thu, Jul 28, 2016 at 6:20 PM, Matei Zaharia 
wrote:

> I also agree with this given the way we develop stuff. We don't really
> want to move to possibly-API-breaking major releases super often, but we do
> have lots of large features that come out all the time, and our current
> name doesn't convey that.
>
> Matei
>
> On Jul 28, 2016, at 4:15 PM, Reynold Xin  wrote:
>
> Yea definitely. Those are consistent with what is defined here:
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy
>
> The only change I'm proposing is replacing "minor" with "feature".
>
>
> On Thu, Jul 28, 2016 at 4:10 PM, Sean Owen  wrote:
>
>> Although 'minor' is the standard term, the important thing is making
>> the nature of the release understood. 'feature release' seems OK to me
>> as an additional description.
>>
>> Is it worth agreeing on or stating a little more about the theory?
>>
>> patch release: backwards/forwards compatible within a minor release,
>> generally fixes only
>> minor/feature release: backwards compatible within a major release,
>> not forward; generally also includes new features
>> major release: not backwards compatible and may remove or change
>> existing features
>>
>> On Thu, Jul 28, 2016 at 3:46 PM, Reynold Xin  wrote:
>> > tl;dr
>> >
>> > I would like to propose renaming “minor release” to “feature release” in
>> > Apache Spark.
>> >
>> >
>> > details
>> >
>> > Apache Spark’s official versioning policy follows roughly semantic
>> > versioning. Each Spark release is versioned as
>> > [major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
>> > “major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.
>> >
>> > I have gotten a lot of feedback from users that the word “minor” is
>> > confusing and does not accurately describes those releases. When users
>> hear
>> > the word “minor”, they think it is a small update that introduces couple
>> > minor features and some bug fixes. But if you look at the history of
>> Spark
>> > 1.x, here are just a subset of large features added:
>> >
>> > Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
>> > perf improvement for machine learning.
>> >
>> > Spark 1.2: HA for streaming, new network module, Python API for
>> streaming,
>> > ML pipelines, data source API.
>> >
>> > Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
>> > algorithms in machine learning.
>> >
>> > Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
>> > functions, window functions, SQL analytic functions, Python API for
>> > pipelines.
>> >
>> > Spark 1.5: code generation, Project Tungsten
>> >
>> > Spark 1.6: automatic memory management, Dataset API, ML pipeline
>> persistence
>> >
>> >
>> > So while “minor” is an accurate depiction of the releases from an API
>> > compatibiility point of view, we are miscommunicating and doing Spark a
>> > disservice by calling these releases “minor”. I would actually call
>> these
>> > releases “major”, but then it would be a larger deviation from semantic
>> > versioning. I think calling these “feature releases” would be a smaller
>> > change and a more accurate depiction of what they are.
>> >
>> > That said, I’m not attached to the name “feature” and am open to
>> > suggestions, as long as they don’t convey the notion of “minor”.
>> >
>> >
>>
>
>
>


-- 
Regards,
Vaquar Khan
+91 830-851-1500


RE: ERROR: java.net.UnknownHostException

2016-07-28 Thread Miki Shingo
All,

I have resolved the issue.
Sorry for your interruption.

Regards,

  Miki

-Original Message-
From: Shingo Miki(新郷 美紀) 
Sent: Thursday, July 28, 2016 6:34 PM
To: dev@spark.apache.org
Subject: ERROR: java.net.UnknownHostException 

To whom who has knowledge?

I have faced the following error try to use HA configuration.
(java.net.UnknownHostException)

below is the error for reference. 

16/07/27 22:42:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
dphmuyarn1107.hadoop.local): java.lang.IllegalArgumentException: 
java.net.UnknownHostException: hdpha
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at 
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdpha
... 36 more

Thanks & Regards

  Miki

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



PySpark UDFs with a return type of FloatType can't handle int return values

2016-07-28 Thread Nicholas Chammas
If I define a UDF in PySpark that has a return type of FloatType, but the
underlying function actually returns an int, the UDF throws the int away
and returns None.

It seems that some machinery inside pyspark.sql.types is perhaps unaware
that it can always cast ints to floats.

Is this functionality that we would want to add in, or is it beyond the
scope of what UDFs should be expected to do?

Nick
​


Re: renaming "minor release" to "feature release"

2016-07-28 Thread Nicholas Chammas
+1

The semantics conveyed by "feature release" are compatible with the meaning
of "minor release" under strict SemVer, but as argued are clearer from a
user-communication point of view.

http://semver.org

Nick
2016년 7월 28일 (목) 오후 7:20, Matei Zaharia 님이 작성:

> I also agree with this given the way we develop stuff. We don't really
> want to move to possibly-API-breaking major releases super often, but we do
> have lots of large features that come out all the time, and our current
> name doesn't convey that.
>
> Matei
>
> On Jul 28, 2016, at 4:15 PM, Reynold Xin  wrote:
>
> Yea definitely. Those are consistent with what is defined here:
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy
>
> The only change I'm proposing is replacing "minor" with "feature".
>
>
> On Thu, Jul 28, 2016 at 4:10 PM, Sean Owen  wrote:
>
>> Although 'minor' is the standard term, the important thing is making
>> the nature of the release understood. 'feature release' seems OK to me
>> as an additional description.
>>
>> Is it worth agreeing on or stating a little more about the theory?
>>
>> patch release: backwards/forwards compatible within a minor release,
>> generally fixes only
>> minor/feature release: backwards compatible within a major release,
>> not forward; generally also includes new features
>> major release: not backwards compatible and may remove or change
>> existing features
>>
>> On Thu, Jul 28, 2016 at 3:46 PM, Reynold Xin  wrote:
>> > tl;dr
>> >
>> > I would like to propose renaming “minor release” to “feature release” in
>> > Apache Spark.
>> >
>> >
>> > details
>> >
>> > Apache Spark’s official versioning policy follows roughly semantic
>> > versioning. Each Spark release is versioned as
>> > [major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
>> > “major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.
>> >
>> > I have gotten a lot of feedback from users that the word “minor” is
>> > confusing and does not accurately describes those releases. When users
>> hear
>> > the word “minor”, they think it is a small update that introduces couple
>> > minor features and some bug fixes. But if you look at the history of
>> Spark
>> > 1.x, here are just a subset of large features added:
>> >
>> > Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
>> > perf improvement for machine learning.
>> >
>> > Spark 1.2: HA for streaming, new network module, Python API for
>> streaming,
>> > ML pipelines, data source API.
>> >
>> > Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
>> > algorithms in machine learning.
>> >
>> > Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
>> > functions, window functions, SQL analytic functions, Python API for
>> > pipelines.
>> >
>> > Spark 1.5: code generation, Project Tungsten
>> >
>> > Spark 1.6: automatic memory management, Dataset API, ML pipeline
>> persistence
>> >
>> >
>> > So while “minor” is an accurate depiction of the releases from an API
>> > compatibiility point of view, we are miscommunicating and doing Spark a
>> > disservice by calling these releases “minor”. I would actually call
>> these
>> > releases “major”, but then it would be a larger deviation from semantic
>> > versioning. I think calling these “feature releases” would be a smaller
>> > change and a more accurate depiction of what they are.
>> >
>> > That said, I’m not attached to the name “feature” and am open to
>> > suggestions, as long as they don’t convey the notion of “minor”.
>> >
>> >
>>
>
>
>


Re: renaming "minor release" to "feature release"

2016-07-28 Thread Matei Zaharia
I also agree with this given the way we develop stuff. We don't really want to 
move to possibly-API-breaking major releases super often, but we do have lots 
of large features that come out all the time, and our current name doesn't 
convey that.

Matei

> On Jul 28, 2016, at 4:15 PM, Reynold Xin  wrote:
> 
> Yea definitely. Those are consistent with what is defined here: 
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy 
> 
> 
> The only change I'm proposing is replacing "minor" with "feature".
> 
> 
> On Thu, Jul 28, 2016 at 4:10 PM, Sean Owen  > wrote:
> Although 'minor' is the standard term, the important thing is making
> the nature of the release understood. 'feature release' seems OK to me
> as an additional description.
> 
> Is it worth agreeing on or stating a little more about the theory?
> 
> patch release: backwards/forwards compatible within a minor release,
> generally fixes only
> minor/feature release: backwards compatible within a major release,
> not forward; generally also includes new features
> major release: not backwards compatible and may remove or change
> existing features
> 
> On Thu, Jul 28, 2016 at 3:46 PM, Reynold Xin  > wrote:
> > tl;dr
> >
> > I would like to propose renaming “minor release” to “feature release” in
> > Apache Spark.
> >
> >
> > details
> >
> > Apache Spark’s official versioning policy follows roughly semantic
> > versioning. Each Spark release is versioned as
> > [major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
> > “major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.
> >
> > I have gotten a lot of feedback from users that the word “minor” is
> > confusing and does not accurately describes those releases. When users hear
> > the word “minor”, they think it is a small update that introduces couple
> > minor features and some bug fixes. But if you look at the history of Spark
> > 1.x, here are just a subset of large features added:
> >
> > Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
> > perf improvement for machine learning.
> >
> > Spark 1.2: HA for streaming, new network module, Python API for streaming,
> > ML pipelines, data source API.
> >
> > Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
> > algorithms in machine learning.
> >
> > Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
> > functions, window functions, SQL analytic functions, Python API for
> > pipelines.
> >
> > Spark 1.5: code generation, Project Tungsten
> >
> > Spark 1.6: automatic memory management, Dataset API, ML pipeline persistence
> >
> >
> > So while “minor” is an accurate depiction of the releases from an API
> > compatibiility point of view, we are miscommunicating and doing Spark a
> > disservice by calling these releases “minor”. I would actually call these
> > releases “major”, but then it would be a larger deviation from semantic
> > versioning. I think calling these “feature releases” would be a smaller
> > change and a more accurate depiction of what they are.
> >
> > That said, I’m not attached to the name “feature” and am open to
> > suggestions, as long as they don’t convey the notion of “minor”.
> >
> >
> 



Re: ERROR: java.net.UnknownHostException

2016-07-28 Thread Kousuke Saruta

Hi Miki,

What version of Spark are you using?
If the version is > 1.4,  you might hit SPARK-11227.

- Kousuke

On 2016/07/28 18:34, Miki Shingo wrote:


To whom who has knowledge?

I have faced the following error try to use HA configuration.
(java.net.UnknownHostException)

below is the error for reference.

16/07/27 22:42:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
dphmuyarn1107.hadoop.local): java.lang.IllegalArgumentException: 
java.net.UnknownHostException: hdpha
 at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
 at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
 at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
 at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
 at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
 at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
 at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
 at 
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
 at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
 at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
 at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
 at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
 at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
 at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
 at scala.Option.map(Option.scala:145)
 at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:216)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
 at org.apache.spark.scheduler.Task.run(Task.scala:89)
 at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdpha
 ... 36 more

Thanks & Regards

   Miki

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: renaming "minor release" to "feature release"

2016-07-28 Thread Reynold Xin
Yea definitely. Those are consistent with what is defined here:
https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy

The only change I'm proposing is replacing "minor" with "feature".


On Thu, Jul 28, 2016 at 4:10 PM, Sean Owen  wrote:

> Although 'minor' is the standard term, the important thing is making
> the nature of the release understood. 'feature release' seems OK to me
> as an additional description.
>
> Is it worth agreeing on or stating a little more about the theory?
>
> patch release: backwards/forwards compatible within a minor release,
> generally fixes only
> minor/feature release: backwards compatible within a major release,
> not forward; generally also includes new features
> major release: not backwards compatible and may remove or change
> existing features
>
> On Thu, Jul 28, 2016 at 3:46 PM, Reynold Xin  wrote:
> > tl;dr
> >
> > I would like to propose renaming “minor release” to “feature release” in
> > Apache Spark.
> >
> >
> > details
> >
> > Apache Spark’s official versioning policy follows roughly semantic
> > versioning. Each Spark release is versioned as
> > [major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
> > “major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.
> >
> > I have gotten a lot of feedback from users that the word “minor” is
> > confusing and does not accurately describes those releases. When users
> hear
> > the word “minor”, they think it is a small update that introduces couple
> > minor features and some bug fixes. But if you look at the history of
> Spark
> > 1.x, here are just a subset of large features added:
> >
> > Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
> > perf improvement for machine learning.
> >
> > Spark 1.2: HA for streaming, new network module, Python API for
> streaming,
> > ML pipelines, data source API.
> >
> > Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
> > algorithms in machine learning.
> >
> > Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
> > functions, window functions, SQL analytic functions, Python API for
> > pipelines.
> >
> > Spark 1.5: code generation, Project Tungsten
> >
> > Spark 1.6: automatic memory management, Dataset API, ML pipeline
> persistence
> >
> >
> > So while “minor” is an accurate depiction of the releases from an API
> > compatibiility point of view, we are miscommunicating and doing Spark a
> > disservice by calling these releases “minor”. I would actually call these
> > releases “major”, but then it would be a larger deviation from semantic
> > versioning. I think calling these “feature releases” would be a smaller
> > change and a more accurate depiction of what they are.
> >
> > That said, I’m not attached to the name “feature” and am open to
> > suggestions, as long as they don’t convey the notion of “minor”.
> >
> >
>


Re: renaming "minor release" to "feature release"

2016-07-28 Thread Sean Owen
Although 'minor' is the standard term, the important thing is making
the nature of the release understood. 'feature release' seems OK to me
as an additional description.

Is it worth agreeing on or stating a little more about the theory?

patch release: backwards/forwards compatible within a minor release,
generally fixes only
minor/feature release: backwards compatible within a major release,
not forward; generally also includes new features
major release: not backwards compatible and may remove or change
existing features

On Thu, Jul 28, 2016 at 3:46 PM, Reynold Xin  wrote:
> tl;dr
>
> I would like to propose renaming “minor release” to “feature release” in
> Apache Spark.
>
>
> details
>
> Apache Spark’s official versioning policy follows roughly semantic
> versioning. Each Spark release is versioned as
> [major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
> “major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.
>
> I have gotten a lot of feedback from users that the word “minor” is
> confusing and does not accurately describes those releases. When users hear
> the word “minor”, they think it is a small update that introduces couple
> minor features and some bug fixes. But if you look at the history of Spark
> 1.x, here are just a subset of large features added:
>
> Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
> perf improvement for machine learning.
>
> Spark 1.2: HA for streaming, new network module, Python API for streaming,
> ML pipelines, data source API.
>
> Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
> algorithms in machine learning.
>
> Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
> functions, window functions, SQL analytic functions, Python API for
> pipelines.
>
> Spark 1.5: code generation, Project Tungsten
>
> Spark 1.6: automatic memory management, Dataset API, ML pipeline persistence
>
>
> So while “minor” is an accurate depiction of the releases from an API
> compatibiility point of view, we are miscommunicating and doing Spark a
> disservice by calling these releases “minor”. I would actually call these
> releases “major”, but then it would be a larger deviation from semantic
> versioning. I think calling these “feature releases” would be a smaller
> change and a more accurate depiction of what they are.
>
> That said, I’m not attached to the name “feature” and am open to
> suggestions, as long as they don’t convey the notion of “minor”.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [build system] jenkins downtime friday afternoon, july 29th 2016

2016-07-28 Thread shane knapp
reminder -- this is happening TOMORROW.

On Wed, Jul 27, 2016 at 5:39 PM, shane knapp  wrote:
> reminder -- this is happening friday afternoon.
>
> i will pause the build queue late friday morning.
>
> On Mon, Jul 25, 2016 at 2:29 PM, shane knapp  wrote:
>> around 1pm  friday, july 29th, we will be taking jenkins down for a
>> rack move and celebrating national systems administrator day.
>>
>> the outage should only last a couple of hours at most, and will be
>> concluded with champagne toasts.
>>
>> yes, the outage and holiday are real, but the champagne in the colo is
>> not...  ;)
>>
>> shane

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



renaming "minor release" to "feature release"

2016-07-28 Thread Reynold Xin
*tl;dr*

I would like to propose renaming “minor release” to “feature release” in
Apache Spark.


*details*

Apache Spark’s official versioning policy follows roughly semantic
versioning. Each Spark release is versioned as
[major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
“major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.

I have gotten a lot of feedback from users that the word “minor” is
confusing and does not accurately describes those releases. When users hear
the word “minor”, they think it is a small update that introduces couple
minor features and some bug fixes. But if you look at the history of Spark
1.x, here are just a subset of large features added:

Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
perf improvement for machine learning.

Spark 1.2: HA for streaming, new network module, Python API for streaming,
ML pipelines, data source API.

Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
algorithms in machine learning.

Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
functions, window functions, SQL analytic functions, Python API for
pipelines.

Spark 1.5: code generation, Project Tungsten

Spark 1.6: automatic memory management, Dataset API, ML pipeline persistence


So while “minor” is an accurate depiction of the releases from an API
compatibiility point of view, we are miscommunicating and doing Spark a
disservice by calling these releases “minor”. I would actually call these
releases “major”, but then it would be a larger deviation from semantic
versioning. I think calling these “feature releases” would be a smaller
change and a more accurate depiction of what they are.

That said, I’m not attached to the name “feature” and am open to
suggestions, as long as they don’t convey the notion of “minor”.


ERROR: java.net.UnknownHostException

2016-07-28 Thread Miki Shingo
To whom who has knowledge?

I have faced the following error try to use HA configuration.
(java.net.UnknownHostException)

below is the error for reference. 

16/07/27 22:42:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
dphmuyarn1107.hadoop.local): java.lang.IllegalArgumentException: 
java.net.UnknownHostException: hdpha
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at 
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1038)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdpha
... 36 more

Thanks & Regards

  Miki

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



spark run shell On yarn

2016-07-28 Thread censj
16/07/28 17:07:34 WARN shortcircuit.DomainSocketFactory: The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
  at 
org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
  at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
  at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
  at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
  at org.apache.spark.SparkContext.(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
  at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
  at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
  at scala.Option.getOrElse(Option.scala:121)
  at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
  ... 47 elided
Caused by: java.lang.ClassNotFoundException: 
com.sun.jersey.api.client.config.ClientConfig
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  ... 60 more
:14: error: not found: value spark
   import spark.implicits._
  ^
:14: error: not found: value spark
   import spark.sql
  ^
Welcome to




hi:
I use spark 2.0,but when I run  
"/etc/spark-2.0.0-bin-hadoop2.6/bin/spark-shell --master yarn” , appear this 
Error.

/etc/spark-2.0.0-bin-hadoop2.6/bin/spark-submit
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/etc/spark-2.0.0-bin-hadoop2.6


how I to update?





===
Name: cen sujun
Mobile: 13067874572
Mail: ce...@lotuseed.com