[jira] [Updated] (SPARK-24093) Make some fields of KafkaStreamWriter/InternalRowMicroBatchWriter visible to outside of the classes

2018-04-25 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-24093:
-
Description: To make third parties able to get the information of streaming 
writer, for example, the information of "writer" and "topic" which streaming 
data are written into, this jira is created to make relevant fields of 
KafkaStreamWriter and InternalRowMicroBatchWriter visible to outside of the 
classes.  (was: We are working on Spark Atlas 
Connector([https://github.com/hortonworks-spark/spark-atlas-connector)|https://github.com/hortonworks-spark/spark-atlas-connector).],
 and adding supports for Spark Streaming. As SAC needs the information of 
"writer" and "topic" which streaming data are written into, this jira is 
created to make relevant fields of KafkaStreamWriter and 
InternalRowMicroBatchWriter visible to outside of the classes.)

> Make some fields of KafkaStreamWriter/InternalRowMicroBatchWriter visible to 
> outside of the classes
> ---
>
> Key: SPARK-24093
> URL: https://issues.apache.org/jira/browse/SPARK-24093
> Project: Spark
>  Issue Type: Wish
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Weiqing Yang
>Priority: Minor
>
> To make third parties able to get the information of streaming writer, for 
> example, the information of "writer" and "topic" which streaming data are 
> written into, this jira is created to make relevant fields of 
> KafkaStreamWriter and InternalRowMicroBatchWriter visible to outside of the 
> classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24093) Make some fields of KafkaStreamWriter/InternalRowMicroBatchWriter visible to outside of the classes

2018-04-25 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-24093:


 Summary: Make some fields of 
KafkaStreamWriter/InternalRowMicroBatchWriter visible to outside of the classes
 Key: SPARK-24093
 URL: https://issues.apache.org/jira/browse/SPARK-24093
 Project: Spark
  Issue Type: Wish
  Components: Structured Streaming
Affects Versions: 2.3.0
Reporter: Weiqing Yang


We are working on Spark Atlas 
Connector([https://github.com/hortonworks-spark/spark-atlas-connector)|https://github.com/hortonworks-spark/spark-atlas-connector).],
 and adding supports for Spark Streaming. As SAC needs the information of 
"writer" and "topic" which streaming data are written into, this jira is 
created to make relevant fields of KafkaStreamWriter and 
InternalRowMicroBatchWriter visible to outside of the classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS

2017-08-10 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122234#comment-16122234
 ] 

Weiqing Yang commented on SPARK-21697:
--

Thanks for filing this issue!

> NPE & ExceptionInInitializerError trying to load UTF from HDFS
> --
>
> Key: SPARK-21697
> URL: https://issues.apache.org/jira/browse/SPARK-21697
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Spark Client mode, Hadoop 2.6.0
>Reporter: Steve Loughran
>Priority: Minor
>
> Reported on [the 
> PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for 
> SPARK-12868: trying to load a UDF of HDFS is triggering an 
> {{ExceptionInInitializerError}}, caused by an NPE which should only happen if 
> the commons-logging {{LOG}} log is null.
> Hypothesis: the commons logging scan for {{commons-logging.properties}} is 
> happening in the classpath with the HDFS JAR; this is triggering a D/L of the 
> JAR, which needs to force in commons-logging, and, as that's not inited yet, 
> NPEs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6628) ClassCastException occurs when executing sql statement "insert into" on hbase table

2017-05-15 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011556#comment-16011556
 ] 

Weiqing Yang commented on SPARK-6628:
-

Hi [~srowen], I just submitted a PR for this. could you please help to review 
it? Thanks.

> ClassCastException occurs when executing sql statement "insert into" on hbase 
> table
> ---
>
> Key: SPARK-6628
> URL: https://issues.apache.org/jira/browse/SPARK-6628
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: meiyoula
>
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 3.0 (TID 12, vm-17): java.lang.ClassCastException: 
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
> org.apache.hadoop.hive.ql.io.HiveOutputFormat
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6628) ClassCastException occurs when executing sql statement "insert into" on hbase table

2017-05-15 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011490#comment-16011490
 ] 

Weiqing Yang commented on SPARK-6628:
-

We met with this issue too.

The major issue is:
{code}
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat 
{code}
cannot be cast to
{code}
 org.apache.hadoop.hive.ql.io.HiveOutputFormat
{code}
The reason is:
{code}
public interface HiveOutputFormat extends OutputFormat {…}

public class HiveHBaseTableOutputFormat extends
TableOutputFormat implements
OutputFormat {...}
{code}

>From the two snippets above, we can see both HiveHBaseTableOutputFormat and 
>HiveOutputFormat 'extends' /'implements' OutputFormat, and can not cast to 
>each other. 

Spark initials the outputformat in SparkHiveWriterContainer of Spark 1.6, 2.0, 
2.1 (or: in HiveFileFormat of Spark 2.2 /Master)
{code}
@transient private lazy val outputFormat =
jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, 
Writable]]
{code}
Notice: this file output format is {color:red}HiveOutputFormat{color}
However, when users write the data into the hbase, the outputFormat is 
HiveHBaseTableOutputFormat, it isn't instance of HiveOutputFormat.

I am going to submit a PR for this.


> ClassCastException occurs when executing sql statement "insert into" on hbase 
> table
> ---
>
> Key: SPARK-6628
> URL: https://issues.apache.org/jira/browse/SPARK-6628
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: meiyoula
>
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 3.0 (TID 12, vm-17): java.lang.ClassCastException: 
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
> org.apache.hadoop.hive.ql.io.HiveOutputFormat
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15857) Add Caller Context in Spark

2017-02-13 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864491#comment-15864491
 ] 

Weiqing Yang commented on SPARK-15857:
--

Thanks. [~zsxwing]

> Add Caller Context in Spark
> ---
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
>  Issue Type: New Feature
>Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15857) Add Caller Context in Spark

2017-02-13 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang resolved SPARK-15857.
--
Resolution: Fixed

> Add Caller Context in Spark
> ---
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
>  Issue Type: New Feature
>Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18746) Add implicit encoders for BigDecimal, timestamp and date

2016-12-12 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18746:
-
Description: 
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

In this pR, implicit encoders for java.math.BigDecimal will be added in the PR. 
Also, timestamp and date 

  was:
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

To fix the error above, an implicit encoder for java.math.BigDecimal will be 
added in the PR. Also, 


> Add implicit encoders for BigDecimal, timestamp and date
> 
>
> Key: SPARK-18746
> URL: https://issues.apache.org/jira/browse/SPARK-18746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weiqing Yang
>
> Run the code below in spark-shell, there will be an error:
> {code}
> scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
> :24: error: Unable to find encoder for type stored in a Dataset.  
> Primitive types (Int, String, etc) and Product types (case classes) are 
> supported by importing spark.implicits._  Support for serializing other types 
> will be added in future releases.
>spark.createDataset(Seq(new java.math.BigDecimal(10)))
>   ^
> scala>
> {code} 
> In this pR, implicit encoders for java.math.BigDecimal will be added in the 
> PR. Also, timestamp and date 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18746) Add implicit encoders for BigDecimal, timestamp and date

2016-12-12 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18746:
-
Description: 
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

In this PR, implicit encoders for BigDecimal, timestamp and date will be added.

  was:
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

In this pR, implicit encoders for java.math.BigDecimal will be added in the PR. 
Also, timestamp and date 


> Add implicit encoders for BigDecimal, timestamp and date
> 
>
> Key: SPARK-18746
> URL: https://issues.apache.org/jira/browse/SPARK-18746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weiqing Yang
>
> Run the code below in spark-shell, there will be an error:
> {code}
> scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
> :24: error: Unable to find encoder for type stored in a Dataset.  
> Primitive types (Int, String, etc) and Product types (case classes) are 
> supported by importing spark.implicits._  Support for serializing other types 
> will be added in future releases.
>spark.createDataset(Seq(new java.math.BigDecimal(10)))
>   ^
> scala>
> {code} 
> In this PR, implicit encoders for BigDecimal, timestamp and date will be 
> added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18746) Add implicit encoders for BigDecimal, timestamp and date

2016-12-12 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18746:
-
Description: 
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

To fix the error above, an implicit encoder for java.math.BigDecimal will be 
added in the PR. Also, 

  was:
Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

To fix the error above, {{newBigDecimalEncoder}} will be added in the PR.


> Add implicit encoders for BigDecimal, timestamp and date
> 
>
> Key: SPARK-18746
> URL: https://issues.apache.org/jira/browse/SPARK-18746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weiqing Yang
>
> Run the code below in spark-shell, there will be an error:
> {code}
> scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
> :24: error: Unable to find encoder for type stored in a Dataset.  
> Primitive types (Int, String, etc) and Product types (case classes) are 
> supported by importing spark.implicits._  Support for serializing other types 
> will be added in future releases.
>spark.createDataset(Seq(new java.math.BigDecimal(10)))
>   ^
> scala>
> {code} 
> To fix the error above, an implicit encoder for java.math.BigDecimal will be 
> added in the PR. Also, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18746) Add implicit encoders for BigDecimal, timestamp and date

2016-12-12 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18746:
-
Summary: Add implicit encoders for BigDecimal, timestamp and date  (was: 
Add newBigDecimalEncoder)

> Add implicit encoders for BigDecimal, timestamp and date
> 
>
> Key: SPARK-18746
> URL: https://issues.apache.org/jira/browse/SPARK-18746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weiqing Yang
>
> Run the code below in spark-shell, there will be an error:
> {code}
> scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
> :24: error: Unable to find encoder for type stored in a Dataset.  
> Primitive types (Int, String, etc) and Product types (case classes) are 
> supported by importing spark.implicits._  Support for serializing other types 
> will be added in future releases.
>spark.createDataset(Seq(new java.math.BigDecimal(10)))
>   ^
> scala>
> {code} 
> To fix the error above, {{newBigDecimalEncoder}} will be added in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18697) Upgrade sbt plugins

2016-12-08 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18697:
-
Description: 
For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

  was:
For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

All other plugins are up-to-date. 


> Upgrade sbt plugins
> ---
>
> Key: SPARK-18697
> URL: https://issues.apache.org/jira/browse/SPARK-18697
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Assignee: Weiqing Yang
>Priority: Trivial
>
> For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
> plugins will be upgraded:
> {code}
> sbteclipse-plugin: 4.0.0 -> 5.0.1
> sbt-mima-plugin: 0.1.11 -> 0.1.12
> org.ow2.asm/asm: 5.0.3 -> 5.1 
> org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18697) Upgrade sbt plugins

2016-12-08 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18697:
-
Description: 
For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

All other plugins are up-to-date. 

  was:
For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbt-assembly: 0.11.2 -> 0.14.3
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

All other plugins are up-to-date. 


> Upgrade sbt plugins
> ---
>
> Key: SPARK-18697
> URL: https://issues.apache.org/jira/browse/SPARK-18697
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Assignee: Weiqing Yang
>Priority: Trivial
>
> For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
> plugins will be upgraded:
> {code}
> sbteclipse-plugin: 4.0.0 -> 5.0.1
> sbt-mima-plugin: 0.1.11 -> 0.1.12
> org.ow2.asm/asm: 5.0.3 -> 5.1 
> org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
> {code}
> All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18746) Add newBigDecimalEncoder

2016-12-06 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18746:


 Summary: Add newBigDecimalEncoder
 Key: SPARK-18746
 URL: https://issues.apache.org/jira/browse/SPARK-18746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Weiqing Yang


Run the code below in spark-shell, there will be an error:
{code}
scala> spark.createDataset(Seq(new java.math.BigDecimal(10)))
:24: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are 
supported by importing spark.implicits._  Support for serializing other types 
will be added in future releases.
   spark.createDataset(Seq(new java.math.BigDecimal(10)))
  ^

scala>
{code} 

To fix the error above, {{newBigDecimalEncoder}} will be added in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18696) Upgrade sbt plugins

2016-12-03 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718550#comment-15718550
 ] 

Weiqing Yang commented on SPARK-18696:
--

Oh, yes, thanks for closing this.

> Upgrade sbt plugins
> ---
>
> Key: SPARK-18696
> URL: https://issues.apache.org/jira/browse/SPARK-18696
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
> plugins will be upgraded:
> {code}
> sbt-assembly: 0.11.2 -> 0.14.3
> sbteclipse-plugin: 4.0.0 -> 5.0.1
> sbt-mima-plugin: 0.1.11 -> 0.1.12
> org.ow2.asm/asm: 5.0.3 -> 5.1 
> org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
> {code}
> All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18697) Upgrade sbt plugins

2016-12-03 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18697:
-
Target Version/s:   (was: 2.2.0)

> Upgrade sbt plugins
> ---
>
> Key: SPARK-18697
> URL: https://issues.apache.org/jira/browse/SPARK-18697
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Trivial
>
> For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
> plugins will be upgraded:
> {code}
> sbt-assembly: 0.11.2 -> 0.14.3
> sbteclipse-plugin: 4.0.0 -> 5.0.1
> sbt-mima-plugin: 0.1.11 -> 0.1.12
> org.ow2.asm/asm: 5.0.3 -> 5.1 
> org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
> {code}
> All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18697) Upgrade sbt plugins

2016-12-02 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717193#comment-15717193
 ] 

Weiqing Yang commented on SPARK-18697:
--

Will submit a PR after SPARK-18638 ([PR 
#16069|https://github.com/apache/spark/pull/16069#issuecomment-264080711]) is 
fixed.

> Upgrade sbt plugins
> ---
>
> Key: SPARK-18697
> URL: https://issues.apache.org/jira/browse/SPARK-18697
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
> plugins will be upgraded:
> {code}
> sbt-assembly: 0.11.2 -> 0.14.3
> sbteclipse-plugin: 4.0.0 -> 5.0.1
> sbt-mima-plugin: 0.1.11 -> 0.1.12
> org.ow2.asm/asm: 5.0.3 -> 5.1 
> org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
> {code}
> All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18697) Upgrade sbt plugins

2016-12-02 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18697:


 Summary: Upgrade sbt plugins
 Key: SPARK-18697
 URL: https://issues.apache.org/jira/browse/SPARK-18697
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Weiqing Yang
Priority: Minor


For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbt-assembly: 0.11.2 -> 0.14.3
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18696) Upgrade sbt plugins

2016-12-02 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18696:


 Summary: Upgrade sbt plugins
 Key: SPARK-18696
 URL: https://issues.apache.org/jira/browse/SPARK-18696
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Weiqing Yang
Priority: Minor


For 2.2.x, it's better to make sbt plugins up-to-date. The following sbt 
plugins will be upgraded:
{code}
sbt-assembly: 0.11.2 -> 0.14.3
sbteclipse-plugin: 4.0.0 -> 5.0.1
sbt-mima-plugin: 0.1.11 -> 0.1.12
org.ow2.asm/asm: 5.0.3 -> 5.1 
org.ow2.asm/asm-commons: 5.0.3 -> 5.1 
{code}

All other plugins are up-to-date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18638) Upgrade sbt, zinc and maven plugins

2016-11-30 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18638:
-
Description: 
v2.1.0-rc1has been out. For 2.2.x, it is better to keep sbt up-to-date, and 
upgrade it from 0.13.11 to 0.13.13. The release notes since the last version we 
used are: https://github.com/sbt/sbt/releases/tag/v0.13.12 and 
https://github.com/sbt/sbt/releases/tag/v0.13.13. Both releases include some 
regression fixes. This jira will also update Zinc and Maven plugins.

{code}
   sbt: 0.13.11 -> 0.13.13,
   zinc: 0.3.9 -> 0.3.11,
   maven-assembly-plugin: 2.6 -> 3.0.0
   maven-compiler-plugin: 3.5.1 -> 3.6.
   maven-jar-plugin: 2.6 -> 3.0.2
   maven-javadoc-plugin: 2.10.3 -> 2.10.4
   maven-source-plugin: 2.4 -> 3.0.1
   org.codehaus.mojo:build-helper-maven-plugin: 1.10 -> 1.12
   org.codehaus.mojo:exec-maven-plugin: 1.4.0 -> 1.5.0
{code}

  was:v2.1.0-rc1has been out. For 2.2.x, it is better to keep sbt up-to-date, 
and upgrade it from 0.13.11 to 0.13.13. The release notes since the last 
version we used are: https://github.com/sbt/sbt/releases/tag/v0.13.12 and 
https://github.com/sbt/sbt/releases/tag/v0.13.13. Both releases include some 
regression fixes. 


> Upgrade sbt, zinc and maven plugins
> ---
>
> Key: SPARK-18638
> URL: https://issues.apache.org/jira/browse/SPARK-18638
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> v2.1.0-rc1has been out. For 2.2.x, it is better to keep sbt up-to-date, and 
> upgrade it from 0.13.11 to 0.13.13. The release notes since the last version 
> we used are: https://github.com/sbt/sbt/releases/tag/v0.13.12 and 
> https://github.com/sbt/sbt/releases/tag/v0.13.13. Both releases include some 
> regression fixes. This jira will also update Zinc and Maven plugins.
> {code}
>sbt: 0.13.11 -> 0.13.13,
>zinc: 0.3.9 -> 0.3.11,
>maven-assembly-plugin: 2.6 -> 3.0.0
>maven-compiler-plugin: 3.5.1 -> 3.6.
>maven-jar-plugin: 2.6 -> 3.0.2
>maven-javadoc-plugin: 2.10.3 -> 2.10.4
>maven-source-plugin: 2.4 -> 3.0.1
>org.codehaus.mojo:build-helper-maven-plugin: 1.10 -> 1.12
>org.codehaus.mojo:exec-maven-plugin: 1.4.0 -> 1.5.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18638) Upgrade sbt, zinc and maven plugins

2016-11-30 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18638:
-
Summary: Upgrade sbt, zinc and maven plugins  (was: Upgrade sbt to 0.13.13)

> Upgrade sbt, zinc and maven plugins
> ---
>
> Key: SPARK-18638
> URL: https://issues.apache.org/jira/browse/SPARK-18638
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> v2.1.0-rc1has been out. For 2.2.x, it is better to keep sbt up-to-date, and 
> upgrade it from 0.13.11 to 0.13.13. The release notes since the last version 
> we used are: https://github.com/sbt/sbt/releases/tag/v0.13.12 and 
> https://github.com/sbt/sbt/releases/tag/v0.13.13. Both releases include some 
> regression fixes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18638) Upgrade sbt to 0.13.13

2016-11-29 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18638:


 Summary: Upgrade sbt to 0.13.13
 Key: SPARK-18638
 URL: https://issues.apache.org/jira/browse/SPARK-18638
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Weiqing Yang
Priority: Minor


v2.1.0-rc1has been out. For 2.2.x, it is better to keep sbt up-to-date, and 
upgrade it from 0.13.11 to 0.13.13. The release notes since the last version we 
used are: https://github.com/sbt/sbt/releases/tag/v0.13.12 and 
https://github.com/sbt/sbt/releases/tag/v0.13.13. Both releases include some 
regression fixes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18629) Fix numPartition of JDBCSuite Testcase

2016-11-29 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18629:


 Summary: Fix numPartition of JDBCSuite Testcase
 Key: SPARK-18629
 URL: https://issues.apache.org/jira/browse/SPARK-18629
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Weiqing Yang
Priority: Minor


When running any one of the test cases in JDBCSuite, you will get the following 
warning.

{code}
10:34:26.389 WARN org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation: 
The number of partitions is reduced because the specified number of partitions 
is less than the difference between upper bound and lower bound. Updated number 
of partitions: 3; Input number of partitions: 4; Lower bound: 1; Upper bound: 
4.{code}

This jira is to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-18521) Add `NoRedundantStringInterpolator` Scala rule

2016-11-26 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang closed SPARK-18521.

Resolution: Won't Fix

> Add `NoRedundantStringInterpolator` Scala rule
> --
>
> Key: SPARK-18521
> URL: https://issues.apache.org/jira/browse/SPARK-18521
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weiqing Yang
>
> Currently the s string interpolator is used in many cases in which there is 
> no embed variable reference in the processed string literals. 
> For example:
> core/src/main/scala/org/apache/spark/deploy/Client.scala
> {code}
> logError(s"Error processing messages, exiting.")
> {code}
> examples/src/main/scala/org/apache/spark/examples/graphx/SynthBenchmark.scala
> {code}
> println(s"Creating graph...")
> {code}
> examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
> {code}
> println(s"Corpus summary:")
> {code}
> sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala
> {code}
> test(s"correctly handle CREATE OR REPLACE TEMPORARY VIEW") {
> {code}
> We can add a new scala style rule 'NoRedundantStringInterpolator' to prevent 
> unnecessary string interpolators. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18521) Add `NoRedundantStringInterpolator` Scala rule

2016-11-20 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18521:


 Summary: Add `NoRedundantStringInterpolator` Scala rule
 Key: SPARK-18521
 URL: https://issues.apache.org/jira/browse/SPARK-18521
 Project: Spark
  Issue Type: Improvement
Reporter: Weiqing Yang


Currently the s string interpolator is used in many cases in which there is no 
embed variable reference in the processed string literals. 
For example:
core/src/main/scala/org/apache/spark/deploy/Client.scala
{code}
logError(s"Error processing messages, exiting.")
{code}

examples/src/main/scala/org/apache/spark/examples/graphx/SynthBenchmark.scala
{code}
println(s"Creating graph...")
{code}

examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
{code}
println(s"Corpus summary:")
{code}

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala
{code}
test(s"correctly handle CREATE OR REPLACE TEMPORARY VIEW") {
{code}

We can add a new scala style rule 'NoRedundantStringInterpolator' to prevent 
unnecessary string interpolators. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18417) Define 'spark.yarn.am.port' in yarn config object

2016-11-11 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-18417:
-
Description: Usually Yarn configurations are defined in yarn/config.scala, 
and are used everywhere. So we should make 'spark.yarn.am.port' defined in yarn 
config.scala as well, making code easier to maintain.  (was: Usually Yarn 
configurations are defined in yarn/config.scala, and then use them everywhere. 
So we should define 'spark.yarn.am.port' in yarn config.scala too, that will 
make code easier to maintain.)

> Define 'spark.yarn.am.port' in yarn config object
> -
>
> Key: SPARK-18417
> URL: https://issues.apache.org/jira/browse/SPARK-18417
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Weiqing Yang
>Priority: Minor
>
> Usually Yarn configurations are defined in yarn/config.scala, and are used 
> everywhere. So we should make 'spark.yarn.am.port' defined in yarn 
> config.scala as well, making code easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18417) Define 'spark.yarn.am.port' in yarn config object

2016-11-11 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-18417:


 Summary: Define 'spark.yarn.am.port' in yarn config object
 Key: SPARK-18417
 URL: https://issues.apache.org/jira/browse/SPARK-18417
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: Weiqing Yang
Priority: Minor


Usually Yarn configurations are defined in yarn/config.scala, and then use them 
everywhere. So we should define 'spark.yarn.am.port' in yarn config.scala too, 
that will make code easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17714) ClassCircularityError is thrown when using org.apache.spark.util.Utils.classForName 

2016-10-13 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572580#comment-15572580
 ] 

Weiqing Yang commented on SPARK-17714:
--

Not yet, need to investigate more. Could we pull in people more familiar with 
the Repl classloader stuff? Thanks.

> ClassCircularityError is thrown when using 
> org.apache.spark.util.Utils.classForName 
> 
>
> Key: SPARK-17714
> URL: https://issues.apache.org/jira/browse/SPARK-17714
> Project: Spark
>  Issue Type: Bug
>Reporter: Weiqing Yang
>
> This jira is a follow up to [SPARK-15857| 
> https://issues.apache.org/jira/browse/SPARK-15857] .
> Task invokes CallerContext. SetCurrentContext() to set its callerContext to 
> HDFS. In SetCurrentContext(), it tries looking for class 
> {{org.apache.hadoop.ipc.CallerContext}} by using 
> {{org.apache.spark.util.Utils.classForName}}. This causes 
> ClassCircularityError to be thrown when running ReplSuite in master Maven 
> builds (The same tests pass in the SBT build). A hotfix 
> [SPARK-17710|https://issues.apache.org/jira/browse/SPARK-17710] has been made 
> by using Class.forName instead, but it needs further investigation.
> Error:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.3/2000/testReport/junit/org.apache.spark.repl/ReplSuite/simple_foreach_with_accumulator/
> {code}
> scala> accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, 
> name: None, value: 0)
> scala> org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 0.0 (TID 0, localhost): java.lang.ClassCircularityError: 
> io/netty/util/internal/_matchers_/org/apache/spark/network/protocol/MessageMatcher
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
> at 
> io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
> at 
> io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
> at 
> io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
> at 
> io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
> at 
> org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
> at org.apache.spark.network.TransportContext.(TransportContext.java:78)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
> at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
> at 
> org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
> at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
> at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
> at 
> org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
> at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
> at 
> io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
> at 
> io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
> at 
> io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
> at 
> io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
> at 
> org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
> at org.apache.spark.network.TransportContext.(TransportContext.java:78)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
> at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
> at 
> org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
> at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
> at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
> at 
> org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
> at 
> 

[jira] [Commented] (SPARK-16757) Set up caller context to HDFS and Yarn

2016-09-28 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531253#comment-15531253
 ] 

Weiqing Yang commented on SPARK-16757:
--

[SPARK-17714|https://issues.apache.org/jira/browse/SPARK-17714] has been 
created for further investigation.

> Set up caller context to HDFS and Yarn
> --
>
> Key: SPARK-16757
> URL: https://issues.apache.org/jira/browse/SPARK-16757
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>Assignee: Weiqing Yang
> Fix For: 2.1.0
>
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17714) ClassCircularityError is thrown when using org.apache.spark.util.Utils.classForName 

2016-09-28 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-17714:
-
Description: 
This jira is a follow up to [SPARK-15857| 
https://issues.apache.org/jira/browse/SPARK-15857] .

Task invokes CallerContext. SetCurrentContext() to set its callerContext to 
HDFS. In SetCurrentContext(), it tries looking for class 
{{org.apache.hadoop.ipc.CallerContext}} by using 
{{org.apache.spark.util.Utils.classForName}}. This causes ClassCircularityError 
to be thrown when running ReplSuite in master Maven builds (The same tests pass 
in the SBT build). A hotfix 
[SPARK-17710|https://issues.apache.org/jira/browse/SPARK-17710] has been made 
by using Class.forName instead, but it needs further investigation.

Error:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.3/2000/testReport/junit/org.apache.spark.repl/ReplSuite/simple_foreach_with_accumulator/
{code}
scala> accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, 
name: None, value: 0)
scala> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
(TID 0, localhost): java.lang.ClassCircularityError: 
io/netty/util/internal/_matchers_/org/apache/spark/network/protocol/MessageMatcher
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.util.CallerContext.setCurrentContext(Utils.scala:2492)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 

[jira] [Updated] (SPARK-17714) ClassCircularityError is thrown when using org.apache.spark.util.Utils.classForName 

2016-09-28 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-17714:
-
Description: 
This jira is related to [SPARK-15857| 
https://issues.apache.org/jira/browse/SPARK-15857] .

Task invokes CallerContext. SetCurrentContext() to set its callerContext to 
HDFS. In SetCurrentContext(), it tries looking for class 
{{org.apache.hadoop.ipc.CallerContext}} by using 
{{org.apache.spark.util.Utils.classForName}}. This causes ClassCircularityError 
to be thrown when running ReplSuite in master Maven builds (The same tests pass 
in the SBT build). A hotfix 
[SPARK-17710|https://issues.apache.org/jira/browse/SPARK-17710] has been made 
by using Class.forName instead, but it needs further investigation.

Error:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.3/2000/testReport/junit/org.apache.spark.repl/ReplSuite/simple_foreach_with_accumulator/
{code}
scala> accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, 
name: None, value: 0)
scala> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
(TID 0, localhost): java.lang.ClassCircularityError: 
io/netty/util/internal/_matchers_/org/apache/spark/network/protocol/MessageMatcher
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.util.CallerContext.setCurrentContext(Utils.scala:2492)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 

[jira] [Created] (SPARK-17714) ClassCircularityError is thrown when using org.apache.spark.util.Utils.classForName 

2016-09-28 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-17714:


 Summary: ClassCircularityError is thrown when using 
org.apache.spark.util.Utils.classForName 
 Key: SPARK-17714
 URL: https://issues.apache.org/jira/browse/SPARK-17714
 Project: Spark
  Issue Type: Bug
Reporter: Weiqing Yang


This jira is a follow up to [SPARK-15857| 
https://issues.apache.org/jira/browse/SPARK-15857] .

Task invokes CallerContext. SetCurrentContext() to set its callerContext to 
HDFS. In SetCurrentContext(), it tries looking for class 
{{org.apache.hadoop.ipc.CallerContext}} by using 
{{org.apache.spark.util.Utils.classForName}}. This causes ClassCircularityError 
to be thrown when running ReplSuite in master Maven builds (The same tests pass 
in the SBT build). A hotfix 
[SPARK-17710|https://issues.apache.org/jira/browse/SPARK-17710] has been made 
by using Class.forName instead, but it needs further investigation.

Error:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.3/2000/testReport/junit/org.apache.spark.repl/ReplSuite/simple_foreach_with_accumulator/
{code}
scala> accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, 
name: None, value: 0)
scala> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
(TID 0, localhost): java.lang.ClassCircularityError: 
io/netty/util/internal/_matchers_/org/apache/spark/network/protocol/MessageMatcher
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
at 
io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
at 
io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
at 
io.netty.handler.codec.MessageToMessageEncoder.(MessageToMessageEncoder.java:59)
at 
org.apache.spark.network.protocol.MessageEncoder.(MessageEncoder.java:34)
at org.apache.spark.network.TransportContext.(TransportContext.java:78)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)
at 
org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.util.CallerContext.setCurrentContext(Utils.scala:2492)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at 

[jira] [Commented] (SPARK-17710) ReplSuite fails with ClassCircularityError in master Maven builds

2016-09-28 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531062#comment-15531062
 ] 

Weiqing Yang commented on SPARK-17710:
--

A PR https://github.com/apache/spark/pull/15286 has been created to resolve 
this. 


> ReplSuite fails with ClassCircularityError in master Maven builds
> -
>
> Key: SPARK-17710
> URL: https://issues.apache.org/jira/browse/SPARK-17710
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Priority: Critical
>
> The master Maven build is currently broken because ReplSuite consistently 
> fails with ClassCircularityErrors. See 
> https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.3 for a 
> timeline of the failure.
> Here's the first build where this failed: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.3/2000/
> This appears to correspond to 
> https://github.com/apache/spark/commit/6a68c5d7b4eb07e4ed6b702dd1536cd08d9bba7d
> The same tests pass in the SBT build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16757) Set up caller context to HDFS and Yarn

2016-09-20 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-16757:
-
Summary: Set up caller context to HDFS and Yarn  (was: Set up caller 
context to HDFS)

> Set up caller context to HDFS and Yarn
> --
>
> Key: SPARK-16757
> URL: https://issues.apache.org/jira/browse/SPARK-16757
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16758) Set up caller context to YARN

2016-09-20 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang resolved SPARK-16758.
--
Resolution: Duplicate

The code for this jira has been put into the PR of SPARK-16757.

> Set up caller context to YARN
> -
>
> Key: SPARK-16758
> URL: https://issues.apache.org/jira/browse/SPARK-16758
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435470#comment-15435470
 ] 

Weiqing Yang edited comment on SPARK-17220 at 8/24/16 6:56 PM:
---

Oh. I see. Thanks for resolving this.


was (Author: weiqingyang):
Oh. I see. Thanks for resolved this.

> Upgrade Py4J to 0.10.3
> --
>
> Key: SPARK-17220
> URL: https://issues.apache.org/jira/browse/SPARK-17220
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weiqing Yang
>Priority: Minor
>
> Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
> Both sides: fixed memory leak issue with ClientServer and potential deadlock 
> issue by creating a memory leak test suite. (Py4J 0.10.2)
> Both sides: added more memory leak tests and fixed a potential memory leak 
> related to listeners. (Py4J 0.10.3)
> So it's time to upgrade py4j from 0.10.1 to 0.10.3. The changelog is 
> available at https://www.py4j.org/changelog.html 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435470#comment-15435470
 ] 

Weiqing Yang commented on SPARK-17220:
--

Oh. I see. Thanks for resolved this.

> Upgrade Py4J to 0.10.3
> --
>
> Key: SPARK-17220
> URL: https://issues.apache.org/jira/browse/SPARK-17220
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weiqing Yang
>Priority: Minor
>
> Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
> Both sides: fixed memory leak issue with ClientServer and potential deadlock 
> issue by creating a memory leak test suite. (Py4J 0.10.2)
> Both sides: added more memory leak tests and fixed a potential memory leak 
> related to listeners. (Py4J 0.10.3)
> So it's time to upgrade py4j from 0.10.1 to 0.10.3. The changelog is 
> available at https://www.py4j.org/changelog.html 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-17220:
-
Description: 
Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
Both sides: fixed memory leak issue with ClientServer and potential deadlock 
issue by creating a memory leak test suite. (Py4J 0.10.2)
Both sides: added more memory leak tests and fixed a potential memory leak 
related to listeners. (Py4J 0.10.3)
So it's time to upgrade py4j from 0.10.1 to 0.10.3. The changelog is available 
at https://www.py4j.org/changelog.html 


  was:
Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
Both sides: fixed memory leak issue with ClientServer and potential deadlock 
issue by creating a memory leak test suite.
Both sides: added more memory leak tests and fixed a potential memory leak 
related to listeners.
So it's time to upgrade py4j to 0.10.3. The changelog is available at 
https://www.py4j.org/changelog.html 



> Upgrade Py4J to 0.10.3
> --
>
> Key: SPARK-17220
> URL: https://issues.apache.org/jira/browse/SPARK-17220
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weiqing Yang
>Priority: Minor
>
> Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
> Both sides: fixed memory leak issue with ClientServer and potential deadlock 
> issue by creating a memory leak test suite. (Py4J 0.10.2)
> Both sides: added more memory leak tests and fixed a potential memory leak 
> related to listeners. (Py4J 0.10.3)
> So it's time to upgrade py4j from 0.10.1 to 0.10.3. The changelog is 
> available at https://www.py4j.org/changelog.html 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-17220:


 Summary: Upgrade Py4J to 0.10.3
 Key: SPARK-17220
 URL: https://issues.apache.org/jira/browse/SPARK-17220
 Project: Spark
  Issue Type: Improvement
Reporter: Weiqing Yang
Priority: Minor


Py4J 0.10.3 has landed. It includes some important bug fixes. For example:
Both sides: fixed memory leak issue with ClientServer and potential deadlock 
issue by creating a memory leak test suite.
Both sides: added more memory leak tests and fixed a potential memory leak 
related to listeners.
So it's time to upgrade py4j to 0.10.3. The changelog is available at 
https://www.py4j.org/changelog.html 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16757) Set up caller context to HDFS

2016-08-19 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429237#comment-15429237
 ] 

Weiqing Yang commented on SPARK-16757:
--

Thanks, [~srowen]. When Spark applications run on HDFS, if Spark reads data 
from HDFS or writes data into HDFS, a corresponding operation record with spark 
caller contexts will be written into hdfs-audit.log. The Spark caller contexts 
are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applications’ name. 
That can help users to better diagnose and understand how specific applications 
impacting parts of the Hadoop system and potential problems they may be 
creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given 
HDFS operation, it's very helpful to track which upper level job issues it.

> Set up caller context to HDFS
> -
>
> Key: SPARK-16757
> URL: https://issues.apache.org/jira/browse/SPARK-16757
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16757) Set up caller context to HDFS

2016-08-16 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423730#comment-15423730
 ] 

Weiqing Yang commented on SPARK-16757:
--

Hi, [~srowen] Could you help review this PR please?

> Set up caller context to HDFS
> -
>
> Key: SPARK-16757
> URL: https://issues.apache.org/jira/browse/SPARK-16757
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16760) Pass 'jobId' to Task

2016-08-15 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang closed SPARK-16760.

Resolution: Duplicate

The code for this jira has been put into the PR of SPARK-16757

> Pass 'jobId' to Task
> 
>
> Key: SPARK-16760
> URL: https://issues.apache.org/jira/browse/SPARK-16760
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In the end, the spark caller context written into HDFS log will associate 
> with task id, stage id, job id, app id, etc, but now Task does not know any 
> job information, so job id will be passed to Task in the patch of this jira. 
> That is good for Spark users to identify tasks especially if Spark supports 
> multi-tenant environment in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-13 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420116#comment-15420116
 ] 

Weiqing Yang commented on SPARK-16966:
--

[~srowen] Thanks for the new PR and review.

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>Assignee: Sean Owen
> Fix For: 2.0.1, 2.1.0
>
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-10 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414778#comment-15414778
 ] 

Weiqing Yang commented on SPARK-16966:
--

Thanks for the feedback. I will update my PR to remove those three lines of 
code.

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16986) "Started" time, "Completed" time and "Last Updated" time in history server UI are not user local time

2016-08-09 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16986:


 Summary: "Started" time, "Completed" time and "Last Updated" time 
in history server UI are not user local time
 Key: SPARK-16986
 URL: https://issues.apache.org/jira/browse/SPARK-16986
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Weiqing Yang
Priority: Minor


Currently, "Started" time, "Completed" time and "Last Updated" time in history 
server UI are GMT. They should be the user local time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413921#comment-15413921
 ] 

Weiqing Yang commented on SPARK-16966:
--

Yes, it will be the name of the class executing.

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413852#comment-15413852
 ] 

Weiqing Yang edited comment on SPARK-16966 at 8/9/16 5:04 PM:
--

Thanks for the quick feedback.

If "--name" is not configured and appName() is not called,  "spark.app.name" 
will be "mainClass"
---
 // Set name from main class if not given
name = Option(name).orElse(Option(mainClass)).orNull
--

If "mainClass" is always there, I think removing those three lines of code will 
be safe, but for pyspark and sparkr, the mainclass might be none, is it safe to 
remove those three lines of code? What do you think? [~srowen][~jerryshao]


was (Author: weiqingyang):
Thanks for the quick feedback.

If "--name" is not configured and appName() does not be called,  
"spark.app.name" will be "mainClass"
---
 // Set name from main class if not given
name = Option(name).orElse(Option(mainClass)).orNull
--

If "mainClass" is always there, I think removing those three lines of code will 
be safe, but for pyspark and sparkr, the mainclass might be none, is it safe to 
remove those three lines of code? What do you think? [~srowen][~jerryshao]

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413852#comment-15413852
 ] 

Weiqing Yang commented on SPARK-16966:
--

Thanks for the quick feedback.

If "--name" is not configured and appName() does not be called,  
"spark.app.name" will be "mainClass"
---
 // Set name from main class if not given
name = Option(name).orElse(Option(mainClass)).orNull
--

If "mainClass" is always there, I think removing those three lines of code will 
be safe, but for pyspark and sparkr, the mainclass might be none, is it safe to 
remove those three lines of code? What do you think? [~srowen][~jerryshao]

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413810#comment-15413810
 ] 

Weiqing Yang edited comment on SPARK-16966 at 8/9/16 4:39 PM:
--

In the tests, I modified org.apache.spark.examples.SparkKMeans (code is from 
spark master branch), commented its appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()


was (Author: weiqingyang):
In the tests, I modified org.apache.spark.examples.SparkKMeans to comment its 
appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413810#comment-15413810
 ] 

Weiqing Yang edited comment on SPARK-16966 at 8/9/16 4:34 PM:
--

In the tests, I modified org.apache.spark.examples.SparkKMeans to comment its 
appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()


was (Author: weiqingyang):
In the tests, I modified org.apache.spark.examples.SparkKMeans example to 
comment its appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413810#comment-15413810
 ] 

Weiqing Yang commented on SPARK-16966:
--

In the tests, I modified org.apache.spark.examples.SparkKMeans example to 
comment its appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413810#comment-15413810
 ] 

Weiqing Yang edited comment on SPARK-16966 at 8/9/16 4:35 PM:
--

In the tests, I modified org.apache.spark.examples.SparkKMeans to comment its 
appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()


was (Author: weiqingyang):
In the tests, I modified org.apache.spark.examples.SparkKMeans to comment its 
appName() call. 

   val spark = SparkSession
  .builder
  // .appName("SparkKMeans")
  .getOrCreate()

> App Name is a randomUUID even when "spark.app.name" exists
> --
>
> Key: SPARK-16966
> URL: https://issues.apache.org/jira/browse/SPARK-16966
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Weiqing Yang
>
> When submitting an application with "--name":
> ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
> --num-executors 1 --master yarn --deploy-mode client --class 
> org.apache.spark.examples.SparkKMeans 
> examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
> hdfs://localhost:9000/lr_big.txt 2 5
> In the history server UI:
> App ID: application_1470694797714_0016
> App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b
> The App Name should not be a randomUUID 
> "70c06dc5-1b99-4b4a-a826-ea27497e977b"  since the "spark.app.name" was 
> myApplicationTest.
> The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
> ".appName()". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16966) App Name is a randomUUID even when "spark.app.name" exists

2016-08-09 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16966:


 Summary: App Name is a randomUUID even when "spark.app.name" exists
 Key: SPARK-16966
 URL: https://issues.apache.org/jira/browse/SPARK-16966
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Weiqing Yang


When submitting an application with "--name":

./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
--num-executors 1 --master yarn --deploy-mode client --class 
org.apache.spark.examples.SparkKMeans 
examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
hdfs://localhost:9000/lr_big.txt 2 5

In the history server UI:
App ID: application_1470694797714_0016
App Name: 70c06dc5-1b99-4b4a-a826-ea27497e977b

The App Name should not be a randomUUID "70c06dc5-1b99-4b4a-a826-ea27497e977b"  
since the "spark.app.name" was myApplicationTest.

The application "org.apache.spark.examples.SparkKMeans" above did not invoke 
".appName()". 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-16945:
-
Component/s: Build

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16945:


 Summary: Fix Java Lint errors
 Key: SPARK-16945
 URL: https://issues.apache.org/jira/browse/SPARK-16945
 Project: Spark
  Issue Type: Task
Reporter: Weiqing Yang
Priority: Minor


There are following errors when running dev/lint-java:
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224] 
(sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15857) Add Caller Context in Spark

2016-07-27 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396677#comment-15396677
 ] 

Weiqing Yang commented on SPARK-15857:
--

That single PR is for subtasks 1, 2 and 4.  Based on the review feedback, I 
will make a new and smaller PR to each sub-task. That will make review easier. 
I will close that PR, and submit the design doc before any new PR.

> Add Caller Context in Spark
> ---
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
>  Issue Type: New Feature
>Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15857) Add Caller Context in Spark

2016-07-27 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396583#comment-15396583
 ] 

Weiqing Yang commented on SPARK-15857:
--

To make review easier, subtasks have been created.

> Add Caller Context in Spark
> ---
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
>  Issue Type: New Feature
>Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16760) Pass 'jobId' to Task

2016-07-27 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16760:


 Summary: Pass 'jobId' to Task
 Key: SPARK-16760
 URL: https://issues.apache.org/jira/browse/SPARK-16760
 Project: Spark
  Issue Type: Sub-task
Reporter: Weiqing Yang


In the end, the spark caller context written into HDFS log will associate with 
task id, stage id, job id, app id, etc, but now Task does not know any job 
information, so job id will be passed to Task in the patch of this jira. That 
is good for Spark users to identify tasks especially if Spark supports 
multi-tenant environment in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16759) Spark expose an API to pass in Caller Context into it

2016-07-27 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16759:


 Summary: Spark expose an API to pass in Caller Context into it
 Key: SPARK-16759
 URL: https://issues.apache.org/jira/browse/SPARK-16759
 Project: Spark
  Issue Type: Sub-task
Reporter: Weiqing Yang


The API should expose a way for the upstream components to inject a caller 
context. Caller context will in the form of a tuple ( caller context type, 
caller context id ). Initial implementation will require support of at least 
one primary caller context. Future versions may need secondary caller contexts 
to also be provided via the same interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16758) Set up caller context to YARN

2016-07-27 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16758:


 Summary: Set up caller context to YARN
 Key: SPARK-16758
 URL: https://issues.apache.org/jira/browse/SPARK-16758
 Project: Spark
  Issue Type: Sub-task
Reporter: Weiqing Yang


In this jira, Spark will invoke hadoop caller context api to set up its caller 
context to YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16757) Set up caller context to HDFS

2016-07-27 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16757:


 Summary: Set up caller context to HDFS
 Key: SPARK-16757
 URL: https://issues.apache.org/jira/browse/SPARK-16757
 Project: Spark
  Issue Type: Sub-task
Reporter: Weiqing Yang


In this jira, Spark will invoke hadoop caller context api to set up its caller 
context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16595) Spark History server Rest Api gives Application not found error for yarn-cluster mode

2016-07-22 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390162#comment-15390162
 ] 

Weiqing Yang commented on SPARK-16595:
--

This issue is not reproduced.

> Spark History server Rest Api gives Application not found error for 
> yarn-cluster mode
> -
>
> Key: SPARK-16595
> URL: https://issues.apache.org/jira/browse/SPARK-16595
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Scenario:
> * Start SparkPi application in Spark1 using yarn-cluster mode 
> (application_1468686376753_0041) 
> * After application finishes validate application exists in respective Spark 
> History server.
> {code}
> Error loading url 
> http://xx.xx.xx.xx:18080/api/v1/applications/application_1468686376753_0041/1/executors
> HTTP Code: 404
> HTTP Data: no such app: application_1468686376753_0041{code}
> {code:title=spark HS log}
> 16/07/16 15:55:29 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468678823755_0049.inprogress
> 16/07/16 15:56:20 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468678823755_0049
> 16/07/16 16:23:14 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468678823755_0061.inprogress
> 16/07/16 16:24:14 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468678823755_0061
> 16/07/16 17:42:32 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/local-1468690940553.inprogress
> 16/07/16 17:43:22 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/local-1468690940553
> 16/07/16 17:43:44 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/local-1468691017376.inprogress
> 16/07/16 17:44:34 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/local-1468691017376
> 16/07/16 18:53:10 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468686376753_0041_1.inprogress
> 16/07/16 19:03:26 INFO PackagesResourceConfig: Scanning for root resource and 
> provider classes in the packages:
>   org.apache.spark.status.api.v1
> 16/07/16 19:03:35 INFO ScanningResourceConfig: Root resource classes found:
>   class org.apache.spark.status.api.v1.ApiRootResource
> 16/07/16 19:03:35 INFO ScanningResourceConfig: Provider classes found:
>   class org.apache.spark.status.api.v1.JacksonMessageWriter
> 16/07/16 19:03:35 INFO WebApplicationImpl: Initiating Jersey application, 
> version 'Jersey: 1.9 09/02/2011 11:17 AM'
> 16/07/16 19:03:36 INFO SecurityManager: Changing view acls to: spark
> 16/07/16 19:03:36 INFO SecurityManager: Changing modify acls to: spark
> 16/07/16 19:03:36 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(spark); users 
> with modify permissions: Set(spark)
> 16/07/16 19:03:36 INFO ApplicationCache: Failed to load application attempt 
> application_1468686376753_0041/Some(1)
> 16/07/16 19:04:21 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468686376753_0043.inprogress
> 16/07/16 19:12:02 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468686376753_0043
> 16/07/16 19:16:11 INFO SecurityManager: Changing view acls to: spark
> 16/07/16 19:16:11 INFO SecurityManager: Changing modify acls to: spark
> 16/07/16 19:16:11 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(spark); users 
> with modify permissions: Set(spark)
> 16/07/16 19:16:11 INFO FsHistoryProvider: Replaying log path: 
> hdfs://xx.xx.xx.xx:8020/spark-history/application_1468686376753_0043
> 16/07/16 19:16:22 INFO SecurityManager: Changing acls enabled to: false
> 16/07/16 19:16:22 INFO SecurityManager: Changing admin acls to:
> 16/07/16 19:16:22 INFO SecurityManager: Changing view acls to: hrt_qa{code}
> {code}
> hdfs@xxx:/var/log/spark$ hdfs dfs -ls /spark-history/
> Found 8 items
> -rwxrwx---   3 hrt_qa hadoop  28793 2016-07-16 15:56 
> /spark-history/application_1468678823755_0049
> -rwxrwx---   3 hrt_qa hadoop  28763 2016-07-16 16:24 
> /spark-history/application_1468678823755_0061
> -rwxrwx---   3 hrt_qa hadoop   58868885 2016-07-16 18:59 
> /spark-history/application_1468686376753_0041_1
> -rwxrwx---   3 hrt_qa hadoop   58841982 2016-07-16 19:11 
> /spark-history/application_1468686376753_0043
> -rwxrwx---   3 hive   hadoop   5823 2016-07-16 11:38 
> /spark-history/local-1468666932940
> -rwxrwx---   3 hive   hadoop   5757 2016-07-16 22:44 
> 

[jira] [Reopened] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-12 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang reopened SPARK-15923:
--

Reopen the jira to update monitoring.md

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-05 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang closed SPARK-15923.

Resolution: Won't Fix

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-05 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363056#comment-15363056
 ] 

Weiqing Yang edited comment on SPARK-15923 at 7/5/16 7:23 PM:
--

[~jerryshao]Thanks for the feedback. After having discussed with Yesha, and 
will close this jira.


was (Author: weiqingyang):
[~jerryshao]Thanks for the feedback. After discussed with Yesha, and will close 
this jira.

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-05 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363056#comment-15363056
 ] 

Weiqing Yang commented on SPARK-15923:
--

[~jerryshao]Thanks for the feedback. After discussed with Yesha, and will close 
this jira.

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-01 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359912#comment-15359912
 ] 

Weiqing Yang commented on SPARK-15923:
--

Hi, [~tgraves] [~ste...@apache.org]could you help to review this, please? 
Thanks.

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15923) Spark Application rest api returns "no such app: "

2016-07-01 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359872#comment-15359872
 ] 

Weiqing Yang commented on SPARK-15923:
--

Debugged in the cluster. Whether the cluster is secure or unsecure, this issue 
happens. And only applications in yarn-client mode have this issue.

Detailed jira description:
1. yarn-client mode:
Applications in yarn-client mode, they donot have 'attemptId' in their records, 
e.g:
"id": "application_1465778870517_0001",
"name": "Spark Pi",
"attempts": [
{"startTime": "2016-06-13T01:07:16.958GMT", "endTime" : 
"2016-06-13T01:09:29.668GMT", "sparkUser" : "hrt_qa", "completed" : true }
]
So when checking the web UI for executors’ information, the link used is 
http://:18080/history/application_1465778870517_0001/executors/, which 
shows all the executors’ information. Note: it does not have attemptId inside 
the link. On the other hand, if calling the rest API: 
http://:18080/api/v1/applications/application_1465778870517_0001/1/executors,
 it has attemptId "1" inside, and gets errors like "no such app" and "INFO 
ApplicationCache: Failed to load application attempt 
application_1465778870517_0001/Some(1)" . Instead, if you try the rest API: 
"http://:18080/api/v1/applications/application_1465778870517_0001/executors",
 which has no attemptId inside, we can see all the executors’ information.

2. yarn-cluster mode:
Applications in yarn-cluster mode. They do have 'attemptId' in their record, 
e.g.:
"id" : "application_1465778870517_0002",
"name" : "Spark Pi",
"attempts" : [
{"attemptId": "1", "startTime" : "2016-06-13T01:12:48.797GMT", "endTime" : 
"2016-06-13T01:14:26.900GMT", "sparkUser" : "hrt_qa", "completed" : true }
]
We can check executor information by web UI and rest API since both of them 
have attemptId “1”:
http://:18080/history/application_1465778870517_0002/1/executors/
http://:18080/api/v1/applications/application_1465778870517_0002/1/executors

Summary:
When checking job/executor information by rest APIs, the "attemptId" is 
included inside. However, in yarn client mode, there will be no attempt ID.

I am going to make a pull request for review.

> Spark Application rest api returns "no such app: "
> -
>
> Key: SPARK-15923
> URL: https://issues.apache.org/jira/browse/SPARK-15923
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Yesha Vora
>
> Env : secure cluster
> Scenario:
> * Run SparkPi application in yarn-client or yarn-cluster mode
> * After application finishes, check Spark HS rest api to get details like 
> jobs / executor etc. 
> {code}
> http://:18080/api/v1/applications/application_1465778870517_0001/1/executors{code}
>  
> Rest api return HTTP Code: 404 and prints "HTTP Data: no such app: 
> application_1465778870517_0001"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15857) Add Caller Context in Spark

2016-06-09 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323655#comment-15323655
 ] 

Weiqing Yang commented on SPARK-15857:
--

I will attach the design doc soon.

> Add Caller Context in Spark
> ---
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
>  Issue Type: New Feature
>Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15857) Add Caller Context in Spark

2016-06-09 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-15857:


 Summary: Add Caller Context in Spark
 Key: SPARK-15857
 URL: https://issues.apache.org/jira/browse/SPARK-15857
 Project: Spark
  Issue Type: New Feature
Reporter: Weiqing Yang


Hadoop has implemented a feature of log tracing – caller context (Jira: 
HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
how specific applications impacting parts of the Hadoop system and potential 
problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
level job issues it. The upper level callers may be specific Oozie tasks, MR 
jobs, hive queries, Spark jobs. 

Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, HIVE-12254) 
and Pig(PIG-4714) have implemented their caller contexts. Those systems invoke 
HDFS client API and Yarn client API to setup caller context, and also expose an 
API to pass in caller context into it.

Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
its caller context via invoking HDFS/Yarn API, and also expose an API to its 
upstream applications to set up their caller contexts. In the end, the spark 
caller context written into Yarn log / HDFS log can associate with task id, 
stage id, job id and app id. That is also very good for Spark users to identify 
tasks especially if Spark supports multi-tenant environment in the future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15707) Make Code Neat - Use map instead of if check

2016-06-01 Thread Weiqing Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310794#comment-15310794
 ] 

Weiqing Yang commented on SPARK-15707:
--

I am going to send pull request for review.

> Make Code Neat - Use map instead of if check
> 
>
> Key: SPARK-15707
> URL: https://issues.apache.org/jira/browse/SPARK-15707
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Weiqing Yang
>Priority: Trivial
>
> In forType function of object RandomDataGenerator, there is a piece of code 
> as following:
> 
> if (maybeSqlTypeGenerator.isDefined) {
>val sqlTypeGenerator = maybeSqlTypeGenerator.get
>val generator = () => {
> ….
>}
>Some(generator)
> } else {
>   None
> }
> ---
> It is better to use maybeSqlTypeGenerator.map instead of ‘if … else …’ above 
> since ‘map’ has ‘if … else …’ inside itself. That will make code neat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15707) Make Code Neat - Use map instead of if check

2016-06-01 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-15707:


 Summary: Make Code Neat - Use map instead of if check
 Key: SPARK-15707
 URL: https://issues.apache.org/jira/browse/SPARK-15707
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Weiqing Yang
Priority: Trivial


In forType function of object RandomDataGenerator, there is a piece of code as 
following:

if (maybeSqlTypeGenerator.isDefined) {
   val sqlTypeGenerator = maybeSqlTypeGenerator.get
   val generator = () => {
….
   }
   Some(generator)
} else {
  None
}
---
It is better to use maybeSqlTypeGenerator.map instead of ‘if … else …’ above 
since ‘map’ has ‘if … else …’ inside itself. That will make code neat. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org