[jira] [Commented] (SPARK-13955) Spark in yarn mode fails

2016-10-20 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593453#comment-15593453
 ] 

Tzach Zohar commented on SPARK-13955:
-

[~saisai_shao] can you clarify regarding option #1: when you say

bq. You need to zip all the jars and specify spark.yarn.archive with the path 
of zipped jars

How exactly should that archive look like? 
We're upgrading from 1.6.2 and we keep getting the same error mentioned above:

bq. Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher

We've tried using {{spark.yarn.archive}} with:
 - The Spark binary downloaded from the download page (e.g. 
http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.6.tgz)
 - Creating a {{.zip}} file with the contents of the {{jars/}} folder from the 
downloaded binary
 - Creating a {{.tgz}} file with the contents of the {{jars/}} folder from the 
downloaded binary  
 - All of these options while placing file either on HDFS or locally on driver 
machine

None of these resolve the issue. The only option that actually worked for us 
was the third one you mentioned - setting neither {{spark.yarn.jars}} nor 
{{spark.yarn.archive}} and making sure the right jars exist in 
{{SPARK_HOME/jars}} on each node - but since we run several applications with 
different spark versions and want to simplify our provisioning - this isn't 
convenient for us.

Any clarification would be greatly appreciated, 
Thanks!

> Spark in yarn mode fails
> 
>
> Key: SPARK-13955
> URL: https://issues.apache.org/jira/browse/SPARK-13955
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
>
> I ran spark-shell in yarn client, but from the logs seems the spark assembly 
> jar is not uploaded to HDFS. This may be known issue in the process of 
> SPARK-11157, create this ticket to track this issue. [~vanzin]
> {noformat}
> 16/03/17 17:57:48 INFO Client: Will allocate AM container, with 896 MB memory 
> including 384 MB overhead
> 16/03/17 17:57:48 INFO Client: Setting up container launch context for our AM
> 16/03/17 17:57:48 INFO Client: Setting up the launch environment for our AM 
> container
> 16/03/17 17:57:48 INFO Client: Preparing resources for our AM container
> 16/03/17 17:57:48 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
> is set, falling back to uploading libraries under SPARK_HOME.
> 16/03/17 17:57:48 INFO Client: Uploading resource 
> file:/Users/jzhang/github/spark/lib/apache-rat-0.10.jar -> 
> hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/apache-rat-0.10.jar
> 16/03/17 17:57:49 INFO Client: Uploading resource 
> file:/Users/jzhang/github/spark/lib/apache-rat-0.11.jar -> 
> hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/apache-rat-0.11.jar
> 16/03/17 17:57:49 INFO Client: Uploading resource 
> file:/private/var/folders/dp/hmchg5dd3vbcvds26q91spdwgp/T/spark-abed04bf-6ac2-448b-91a9-dcc1c401a18f/__spark_conf__4163776487351314654.zip
>  -> 
> hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/__spark_conf__4163776487351314654.zip
> 16/03/17 17:57:49 INFO SecurityManager: Changing view acls to: jzhang
> 16/03/17 17:57:49 INFO SecurityManager: Changing modify acls to: jzhang
> 16/03/17 17:57:49 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(jzhang); users 
> with modify permissions: Set(jzhang)
> 16/03/17 17:57:49 INFO Client: Submitting application 6 to ResourceManager
> {noformat}
> message in AM container
> {noformat}
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ExecutorLauncher
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14958) Failed task hangs if error is encountered when getting task result

2016-09-29 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532058#comment-15532058
 ] 

Tzach Zohar commented on SPARK-14958:
-

I might be seeing the same thing - using Spark 1.6.2. [~lirui] which version 
did you see this on?

> Failed task hangs if error is encountered when getting task result
> --
>
> Key: SPARK-14958
> URL: https://issues.apache.org/jira/browse/SPARK-14958
> Project: Spark
>  Issue Type: Bug
>Reporter: Rui Li
>
> In {{TaskResultGetter}}, if we get an error when deserialize 
> {{TaskEndReason}}, TaskScheduler won't have a chance to handle the failed 
> task and the task just hangs.
> {code}
>   def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: 
> TaskState,
> serializedData: ByteBuffer) {
> var reason : TaskEndReason = UnknownReason
> try {
>   getTaskResultExecutor.execute(new Runnable {
> override def run(): Unit = Utils.logUncaughtExceptions {
>   val loader = Utils.getContextOrSparkClassLoader
>   try {
> if (serializedData != null && serializedData.limit() > 0) {
>   reason = serializer.get().deserialize[TaskEndReason](
> serializedData, loader)
> }
>   } catch {
> case cnd: ClassNotFoundException =>
>   // Log an error but keep going here -- the task failed, so not 
> catastrophic
>   // if we can't deserialize the reason.
>   logError(
> "Could not deserialize TaskEndReason: ClassNotFound with 
> classloader " + loader)
> case ex: Exception => {}
>   }
>   scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
> }
>   })
> } catch {
>   case e: RejectedExecutionException if sparkEnv.isStopped =>
> // ignore it
> }
>   }
> {code}
> In my specific case, I got a NoClassDefFoundError and the failed task hangs 
> forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13210) NPE in Sort

2016-09-16 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495921#comment-15495921
 ] 

Tzach Zohar commented on SPARK-13210:
-

I've just seen this happening on Spark 1.6.2 - very similar stack trace (line 
number changed slightly, but same stack).

I'm guessing fix wasn't ported to 1.6.1 eventually? 
If so - maybe 1.6.* versions should be added to "affected versions"?

{code:none}
java.lang.NullPointerException
at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:56)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:37)
at 
org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:235)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:186)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:122)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:201)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:332)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:373)
at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:139)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply$mcV$sp(WriterContainer.scala:377)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:409)
... 8 more
{code}



> NPE in Sort
> ---
>
> Key: SPARK-13210
> URL: https://issues.apache.org/jira/browse/SPARK-13210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 2.0.0
>
>
> When run TPCDS query Q78 with scale 10:
> {code}
> 16/02/04 22:39:09 ERROR Executor: Managed memory leak detected; size = 
> 268435456 bytes, TID = 143
> 16/02/04 22:39:09 ERROR Executor: Exception in task 0.0 in stage 47.0 (TID 
> 143)
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:333)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:60)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:39)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:239)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getSortedIterator(UnsafeExternalSorter.java:415)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:116)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
>   at 

[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-08-11 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417916#comment-15417916
 ] 

Tzach Zohar commented on SPARK-3577:


Does this mean that currently, spill time will be displayed as part of the 
*Scheduler Delay*? 
Scheduler Delay is calculated pretty much as "everything that isn't 
specifically measured" (see 
[StagePage.getSchedulerDelay|https://github.com/apache/spark/blob/v2.0.0/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L770]),
 so I'm wondering if indeed it might include  spill time if it's not included 
anywhere else. 

If so - this might explain long Scheduler Delay values which would be hard to 
make sense of otherwise (which I think is what I'm seeing...).

Thanks

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9936) decimal precision lost when loading DataFrame from RDD

2015-08-13 Thread Tzach Zohar (JIRA)
Tzach Zohar created SPARK-9936:
--

 Summary: decimal precision lost when loading DataFrame from RDD
 Key: SPARK-9936
 URL: https://issues.apache.org/jira/browse/SPARK-9936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Tzach Zohar


It seems that when converting an RDD that contains BigDecimals into a DataFrame 
(using SQLContext.createDataFrame without specifying schema), precision info is 
lost, which means saving as Parquet file will fail (Parquet tries to verify 
precision  18, so fails if it's unset).

This seems to be similar to 
[SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed the 
same issue for DataFrames created via JDBC.

To reproduce:
{code:none}
scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, 
BigDecimal.valueOf(0.234
rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = ParallelCollectionRDD[0] 
at parallelize at console:23

scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd)
df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)]

scala df.write.parquet(/data/parquet-file)
15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
(TID 0)
java.lang.RuntimeException: Unsupported datatype DecimalType()
{code}

To verify this is indeed caused by the precision being lost, I've tried 
manually changing the schema to include precision (by traversing the 
StructFields and replacing the DecimalTypes with altered DecimalTypes), 
creating a new DataFrame using this updated schema - and indeed it fixes the 
problem.

I'm using Spark 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9936) decimal precision lost when loading DataFrame from RDD

2015-08-13 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695307#comment-14695307
 ] 

Tzach Zohar commented on SPARK-9936:


[~viirya] indeed! I've just located the problematic line and indeed in master 
it's 
[fixed|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L134]..
 I guess I'll stick to my workaround until 1.5 is released, thanks.

Should I close this issue? Is it a duplicate of an existing issue that I failed 
to find? Not sure I know the procedure here...

 decimal precision lost when loading DataFrame from RDD
 --

 Key: SPARK-9936
 URL: https://issues.apache.org/jira/browse/SPARK-9936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Tzach Zohar

 It seems that when converting an RDD that contains BigDecimals into a 
 DataFrame (using SQLContext.createDataFrame without specifying schema), 
 precision info is lost, which means saving as Parquet file will fail (Parquet 
 tries to verify precision  18, so fails if it's unset).
 This seems to be similar to 
 [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed 
 the same issue for DataFrames created via JDBC.
 To reproduce:
 {code:none}
 scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, 
 BigDecimal.valueOf(0.234
 rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = 
 ParallelCollectionRDD[0] at parallelize at console:23
 scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd)
 df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)]
 scala df.write.parquet(/data/parquet-file)
 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
 (TID 0)
 java.lang.RuntimeException: Unsupported datatype DecimalType()
 {code}
 To verify this is indeed caused by the precision being lost, I've tried 
 manually changing the schema to include precision (by traversing the 
 StructFields and replacing the DecimalTypes with altered DecimalTypes), 
 creating a new DataFrame using this updated schema - and indeed it fixes the 
 problem.
 I'm using Spark 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9936) decimal precision lost when loading DataFrame from RDD

2015-08-13 Thread Tzach Zohar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzach Zohar closed SPARK-9936.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

 decimal precision lost when loading DataFrame from RDD
 --

 Key: SPARK-9936
 URL: https://issues.apache.org/jira/browse/SPARK-9936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Tzach Zohar
 Fix For: 1.5.0


 It seems that when converting an RDD that contains BigDecimals into a 
 DataFrame (using SQLContext.createDataFrame without specifying schema), 
 precision info is lost, which means saving as Parquet file will fail (Parquet 
 tries to verify precision  18, so fails if it's unset).
 This seems to be similar to 
 [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed 
 the same issue for DataFrames created via JDBC.
 To reproduce:
 {code:none}
 scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, 
 BigDecimal.valueOf(0.234
 rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = 
 ParallelCollectionRDD[0] at parallelize at console:23
 scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd)
 df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)]
 scala df.write.parquet(/data/parquet-file)
 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
 (TID 0)
 java.lang.RuntimeException: Unsupported datatype DecimalType()
 {code}
 To verify this is indeed caused by the precision being lost, I've tried 
 manually changing the schema to include precision (by traversing the 
 StructFields and replacing the DecimalTypes with altered DecimalTypes), 
 creating a new DataFrame using this updated schema - and indeed it fixes the 
 problem.
 I'm using Spark 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org