[jira] [Commented] (SPARK-13955) Spark in yarn mode fails
[ https://issues.apache.org/jira/browse/SPARK-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593453#comment-15593453 ] Tzach Zohar commented on SPARK-13955: - [~saisai_shao] can you clarify regarding option #1: when you say bq. You need to zip all the jars and specify spark.yarn.archive with the path of zipped jars How exactly should that archive look like? We're upgrading from 1.6.2 and we keep getting the same error mentioned above: bq. Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher We've tried using {{spark.yarn.archive}} with: - The Spark binary downloaded from the download page (e.g. http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.6.tgz) - Creating a {{.zip}} file with the contents of the {{jars/}} folder from the downloaded binary - Creating a {{.tgz}} file with the contents of the {{jars/}} folder from the downloaded binary - All of these options while placing file either on HDFS or locally on driver machine None of these resolve the issue. The only option that actually worked for us was the third one you mentioned - setting neither {{spark.yarn.jars}} nor {{spark.yarn.archive}} and making sure the right jars exist in {{SPARK_HOME/jars}} on each node - but since we run several applications with different spark versions and want to simplify our provisioning - this isn't convenient for us. Any clarification would be greatly appreciated, Thanks! > Spark in yarn mode fails > > > Key: SPARK-13955 > URL: https://issues.apache.org/jira/browse/SPARK-13955 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.0.0 >Reporter: Jeff Zhang >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > > I ran spark-shell in yarn client, but from the logs seems the spark assembly > jar is not uploaded to HDFS. This may be known issue in the process of > SPARK-11157, create this ticket to track this issue. [~vanzin] > {noformat} > 16/03/17 17:57:48 INFO Client: Will allocate AM container, with 896 MB memory > including 384 MB overhead > 16/03/17 17:57:48 INFO Client: Setting up container launch context for our AM > 16/03/17 17:57:48 INFO Client: Setting up the launch environment for our AM > container > 16/03/17 17:57:48 INFO Client: Preparing resources for our AM container > 16/03/17 17:57:48 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive > is set, falling back to uploading libraries under SPARK_HOME. > 16/03/17 17:57:48 INFO Client: Uploading resource > file:/Users/jzhang/github/spark/lib/apache-rat-0.10.jar -> > hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/apache-rat-0.10.jar > 16/03/17 17:57:49 INFO Client: Uploading resource > file:/Users/jzhang/github/spark/lib/apache-rat-0.11.jar -> > hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/apache-rat-0.11.jar > 16/03/17 17:57:49 INFO Client: Uploading resource > file:/private/var/folders/dp/hmchg5dd3vbcvds26q91spdwgp/T/spark-abed04bf-6ac2-448b-91a9-dcc1c401a18f/__spark_conf__4163776487351314654.zip > -> > hdfs://localhost:9000/user/jzhang/.sparkStaging/application_1458187008455_0006/__spark_conf__4163776487351314654.zip > 16/03/17 17:57:49 INFO SecurityManager: Changing view acls to: jzhang > 16/03/17 17:57:49 INFO SecurityManager: Changing modify acls to: jzhang > 16/03/17 17:57:49 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(jzhang); users > with modify permissions: Set(jzhang) > 16/03/17 17:57:49 INFO Client: Submitting application 6 to ResourceManager > {noformat} > message in AM container > {noformat} > Error: Could not find or load main class > org.apache.spark.deploy.yarn.ExecutorLauncher > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14958) Failed task hangs if error is encountered when getting task result
[ https://issues.apache.org/jira/browse/SPARK-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532058#comment-15532058 ] Tzach Zohar commented on SPARK-14958: - I might be seeing the same thing - using Spark 1.6.2. [~lirui] which version did you see this on? > Failed task hangs if error is encountered when getting task result > -- > > Key: SPARK-14958 > URL: https://issues.apache.org/jira/browse/SPARK-14958 > Project: Spark > Issue Type: Bug >Reporter: Rui Li > > In {{TaskResultGetter}}, if we get an error when deserialize > {{TaskEndReason}}, TaskScheduler won't have a chance to handle the failed > task and the task just hangs. > {code} > def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: > TaskState, > serializedData: ByteBuffer) { > var reason : TaskEndReason = UnknownReason > try { > getTaskResultExecutor.execute(new Runnable { > override def run(): Unit = Utils.logUncaughtExceptions { > val loader = Utils.getContextOrSparkClassLoader > try { > if (serializedData != null && serializedData.limit() > 0) { > reason = serializer.get().deserialize[TaskEndReason]( > serializedData, loader) > } > } catch { > case cnd: ClassNotFoundException => > // Log an error but keep going here -- the task failed, so not > catastrophic > // if we can't deserialize the reason. > logError( > "Could not deserialize TaskEndReason: ClassNotFound with > classloader " + loader) > case ex: Exception => {} > } > scheduler.handleFailedTask(taskSetManager, tid, taskState, reason) > } > }) > } catch { > case e: RejectedExecutionException if sparkEnv.isStopped => > // ignore it > } > } > {code} > In my specific case, I got a NoClassDefFoundError and the failed task hangs > forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13210) NPE in Sort
[ https://issues.apache.org/jira/browse/SPARK-13210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495921#comment-15495921 ] Tzach Zohar commented on SPARK-13210: - I've just seen this happening on Spark 1.6.2 - very similar stack trace (line number changed slightly, but same stack). I'm guessing fix wasn't ported to 1.6.1 eventually? If so - maybe 1.6.* versions should be added to "affected versions"? {code:none} java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:351) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:56) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:37) at org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270) at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142) at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:235) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:186) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:122) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:201) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:332) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:373) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:139) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply$mcV$sp(WriterContainer.scala:377) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:409) ... 8 more {code} > NPE in Sort > --- > > Key: SPARK-13210 > URL: https://issues.apache.org/jira/browse/SPARK-13210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > Fix For: 2.0.0 > > > When run TPCDS query Q78 with scale 10: > {code} > 16/02/04 22:39:09 ERROR Executor: Managed memory leak detected; size = > 268435456 bytes, TID = 143 > 16/02/04 22:39:09 ERROR Executor: Exception in task 0.0 in stage 47.0 (TID > 143) > java.lang.NullPointerException > at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:333) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:60) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:39) > at > org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270) > at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142) > at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:239) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getSortedIterator(UnsafeExternalSorter.java:415) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:116) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168) > at
[jira] [Commented] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417916#comment-15417916 ] Tzach Zohar commented on SPARK-3577: Does this mean that currently, spill time will be displayed as part of the *Scheduler Delay*? Scheduler Delay is calculated pretty much as "everything that isn't specifically measured" (see [StagePage.getSchedulerDelay|https://github.com/apache/spark/blob/v2.0.0/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L770]), so I'm wondering if indeed it might include spill time if it's not included anywhere else. If so - this might explain long Scheduler Delay values which would be hard to make sense of otherwise (which I think is what I'm seeing...). Thanks > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9936) decimal precision lost when loading DataFrame from RDD
Tzach Zohar created SPARK-9936: -- Summary: decimal precision lost when loading DataFrame from RDD Key: SPARK-9936 URL: https://issues.apache.org/jira/browse/SPARK-9936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Tzach Zohar It seems that when converting an RDD that contains BigDecimals into a DataFrame (using SQLContext.createDataFrame without specifying schema), precision info is lost, which means saving as Parquet file will fail (Parquet tries to verify precision 18, so fails if it's unset). This seems to be similar to [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed the same issue for DataFrames created via JDBC. To reproduce: {code:none} scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, BigDecimal.valueOf(0.234 rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = ParallelCollectionRDD[0] at parallelize at console:23 scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd) df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)] scala df.write.parquet(/data/parquet-file) 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Unsupported datatype DecimalType() {code} To verify this is indeed caused by the precision being lost, I've tried manually changing the schema to include precision (by traversing the StructFields and replacing the DecimalTypes with altered DecimalTypes), creating a new DataFrame using this updated schema - and indeed it fixes the problem. I'm using Spark 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9936) decimal precision lost when loading DataFrame from RDD
[ https://issues.apache.org/jira/browse/SPARK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695307#comment-14695307 ] Tzach Zohar commented on SPARK-9936: [~viirya] indeed! I've just located the problematic line and indeed in master it's [fixed|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L134].. I guess I'll stick to my workaround until 1.5 is released, thanks. Should I close this issue? Is it a duplicate of an existing issue that I failed to find? Not sure I know the procedure here... decimal precision lost when loading DataFrame from RDD -- Key: SPARK-9936 URL: https://issues.apache.org/jira/browse/SPARK-9936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Tzach Zohar It seems that when converting an RDD that contains BigDecimals into a DataFrame (using SQLContext.createDataFrame without specifying schema), precision info is lost, which means saving as Parquet file will fail (Parquet tries to verify precision 18, so fails if it's unset). This seems to be similar to [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed the same issue for DataFrames created via JDBC. To reproduce: {code:none} scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, BigDecimal.valueOf(0.234 rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = ParallelCollectionRDD[0] at parallelize at console:23 scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd) df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)] scala df.write.parquet(/data/parquet-file) 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Unsupported datatype DecimalType() {code} To verify this is indeed caused by the precision being lost, I've tried manually changing the schema to include precision (by traversing the StructFields and replacing the DecimalTypes with altered DecimalTypes), creating a new DataFrame using this updated schema - and indeed it fixes the problem. I'm using Spark 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9936) decimal precision lost when loading DataFrame from RDD
[ https://issues.apache.org/jira/browse/SPARK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzach Zohar closed SPARK-9936. -- Resolution: Fixed Fix Version/s: 1.5.0 decimal precision lost when loading DataFrame from RDD -- Key: SPARK-9936 URL: https://issues.apache.org/jira/browse/SPARK-9936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Tzach Zohar Fix For: 1.5.0 It seems that when converting an RDD that contains BigDecimals into a DataFrame (using SQLContext.createDataFrame without specifying schema), precision info is lost, which means saving as Parquet file will fail (Parquet tries to verify precision 18, so fails if it's unset). This seems to be similar to [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed the same issue for DataFrames created via JDBC. To reproduce: {code:none} scala val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq((a, BigDecimal.valueOf(0.234 rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = ParallelCollectionRDD[0] at parallelize at console:23 scala val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd) df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)] scala df.write.parquet(/data/parquet-file) 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Unsupported datatype DecimalType() {code} To verify this is indeed caused by the precision being lost, I've tried manually changing the schema to include precision (by traversing the StructFields and replacing the DecimalTypes with altered DecimalTypes), creating a new DataFrame using this updated schema - and indeed it fixes the problem. I'm using Spark 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org