Re: A Spark Compilation Question
Hi Hansu, I have encountered the same problem. Maven compiled avro file and generated corresponding Java file in new directory which is not source file directory of the project. I have modified pom.xml file and it can be work. The line marked as red is added, you can add them to your spark-*.*.*/external/flume-sink/pom.xml. plugin groupIdorg.apache.avro/groupId artifactIdavro-maven-plugin/artifactId version${avro.version}/version configuration !-- Generate the output in the same directory as the sbt-avro-plugin -- outputDirectory${project.basedir}/target/scala-${scala.binary.version}/src_managed/main/compiled_avro/outputDirectory outputDirectory${project.basedir}/src/main/java/outputDirectory /configuration executions execution phasegenerate-sources/phase goals goalidl-protocol/goal /goals /execution /executions /plugin plugin groupIdorg.codehaus.mojo/groupId artifactIdbuild-helper-maven-plugin/artifactId version1.9.1/version executions execution idadd-source/id phasegenerate-sources/phase goals goaladd-source/goal /goals configuration sources source${project.basedir}/src/main/java/source /sources /configuration /execution /executions /plugin 2014-09-13 2:45 GMT+08:00 Hansu GU guha...@gmail.com: I downloaded the source and imported it into IntelliJ 13.1 as a Maven project. When I used IntelliJ Build - make Project, I encountered: Error:(44, 66) not found: type SparkFlumeProtocol val transactionTimeout: Int, val backOffInterval: Int) extends SparkFlumeProtocol with Logging { I think there are some avro generated files missing but I am not sure. Could anyone help me understand this in order to successfully compile the source? Thanks, Hansu - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
executorAdded event to DAGScheduler
Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded *method is called by *TaskSchedulerImpl*. I see some issue in the below code, *TaskSchedulerImpl.scala code* if (!executorsByHost.contains(o.host)) { executorsByHost(o.host) = new HashSet[String]() executorAdded(o.executorId, o.host) newExecAvail = true } Note that executorAdded is called only when there is a new host and not for every new executor. For instance, there can be two executors in the same host and in this case. (But DAGScheduler executorAdded is notified only for new host - so only once in this case). If this is indeed an issue, I would like to submit a patch for this quickly. [cc Andrew Or] - Praveen
Re: executorAdded event to DAGScheduler
Some corrections. On Fri, Sep 26, 2014 at 5:32 PM, praveen seluka praveen.sel...@gmail.com wrote: Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded *method is called by *TaskSchedulerImpl*. I see some issue in the below code, *TaskSchedulerImpl.scala code* if (!executorsByHost.contains(o.host)) { executorsByHost(o.host) = new HashSet[String]() executorAdded(o.executorId, o.host) newExecAvail = true } Note that executorAdded is called only when there is a new host and not for every new executor. For instance, there can be two executors in the same host and in this case the DAGscheduler is notified only once. If this is indeed an issue, I would like to submit a patch for this quickly. [cc Andrew Or] - Praveen
Re: executorAdded event to DAGScheduler
just a quick reply, we cannot start two executors in the same host for a single application in the standard deployment (one worker per machine) I’m not sure if it will create an issue when you have multiple workers in the same host, as submitWaitingStages is called everywhere and I never try such a deployment mode Best, -- Nan Zhu On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote: Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? DAGScheduler does submitWaitingStages when executorAdded method is called by TaskSchedulerImpl. I see some issue in the below code, TaskSchedulerImpl.scala code if (!executorsByHost.contains(o.host)) { executorsByHost(o.host) = new HashSet[String]() executorAdded(o.executorId, o.host) newExecAvail = true } Note that executorAdded is called only when there is a new host and not for every new executor. For instance, there can be two executors in the same host and in this case. (But DAGScheduler executorAdded is notified only for new host - so only once in this case). If this is indeed an issue, I would like to submit a patch for this quickly. [cc Andrew Or] - Praveen
Re: executorAdded event to DAGScheduler
In Yarn, we can easily have multiple containers allocated in the same node. On Fri, Sep 26, 2014 at 6:05 PM, Nan Zhu zhunanmcg...@gmail.com wrote: just a quick reply, we cannot start two executors in the same host for a single application in the standard deployment (one worker per machine) I’m not sure if it will create an issue when you have multiple workers in the same host, as submitWaitingStages is called everywhere and I never try such a deployment mode Best, -- Nan Zhu On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote: Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded *method is called by *TaskSchedulerImpl*. I see some issue in the below code, *TaskSchedulerImpl.scala code* if (!executorsByHost.contains(o.host)) { executorsByHost(o.host) = new HashSet[String]() executorAdded(o.executorId, o.host) newExecAvail = true } Note that executorAdded is called only when there is a new host and not for every new executor. For instance, there can be two executors in the same host and in this case. (But DAGScheduler executorAdded is notified only for new host - so only once in this case). If this is indeed an issue, I would like to submit a patch for this quickly. [cc Andrew Or] - Praveen
Re: PARSING_ERROR from kryo
I am seeing the same error as well since upgrading to Spark1.1: 14/09/26 15:35:05 ERROR executor.Executor: Exception in task 1032.0 in stage 5.1 (TID 22449) com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) at com.esotericsoftware.kryo.io.Input.fill(Input.java:142) at com.esotericsoftware.kryo.io.Input.require(Input.java:155) at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1082) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) Out of 6000 tasks 5000 something finish fine, so I don't believe there are any issues with the serialization and some other datasets everything works fine. Also the same code, same dataset worked fine with Spark 1.0.2 On Mon, Sep 15, 2014 at 9:57 PM, npanj nitinp...@gmail.com wrote: Hi Andrew, No I could not figure out the root cause. This seems to be non-deterministic error... I didn't see same error after rerunning same program. But I noticed same error on a different program. First I thought that this may be related to SPARK-2878, but @Graham replied that this looks irrelevant. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/PARSING-ERROR-from-kryo-tp7944p8433.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
FYI: jenkins systems patched to fix bash exploit
all of our systems were affected by the shellshock bug, and i've just patched everything w/the latest fix from redhat: https://access.redhat.com/articles/1200223 we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems. shane
Re: FYI: jenkins systems patched to fix bash exploit
we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems. s/not/now :)
thank you for reviewing our patches
I recently came across this mailing list post by Linus Torvalds https://lkml.org/lkml/2004/12/20/255 about the value of reviewing even “trivial” patches. The following passages stood out to me: I think that much more important than the patch is the fact that people get used to the notion that they can change the kernel … So please don’t stop. Yes, those trivial patches *are* a bother. Damn, they are *horrible*. But at the same time, the devil is in the detail, and they are needed in the long run. Both the patches themselves, and the people that grew up on them. Spark is the first (and currently only) open source project I contribute regularly to. My first several PRs against the project, as simple as they were, were definitely patches that I “grew up on”. I appreciate the time and effort all the reviewers I’ve interacted with have taken to work with me on my PRs, even when they are “trivial”. And I’m sure that as I continue to contribute to this project there will be many more patches that I will “grow up on”. Thank you Patrick, Reynold, Josh, Davies, Michael, and everyone else who’s taken time to review one of my patches. I appreciate it! Nick
Re: thank you for reviewing our patches
Keep the patches coming :) On Fri, Sep 26, 2014 at 1:50 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this mailing list post by Linus Torvalds https://lkml.org/lkml/2004/12/20/255 about the value of reviewing even “trivial” patches. The following passages stood out to me: I think that much more important than the patch is the fact that people get used to the notion that they can change the kernel … So please don’t stop. Yes, those trivial patches *are* a bother. Damn, they are *horrible*. But at the same time, the devil is in the detail, and they are needed in the long run. Both the patches themselves, and the people that grew up on them. Spark is the first (and currently only) open source project I contribute regularly to. My first several PRs against the project, as simple as they were, were definitely patches that I “grew up on”. I appreciate the time and effort all the reviewers I’ve interacted with have taken to work with me on my PRs, even when they are “trivial”. And I’m sure that as I continue to contribute to this project there will be many more patches that I will “grow up on”. Thank you Patrick, Reynold, Josh, Davies, Michael, and everyone else who’s taken time to review one of my patches. I appreciate it! Nick
Re: SparkSQL: map type MatchError when inserting into Hive table
Would you mind to provide the DDL of this partitioned table together with the query you tried? The stacktrace suggests that the query was trying to cast a map into something else, which is not supported in Spark SQL. And I doubt whether Hive support casting a complex type to some other type. On 9/27/14 7:48 AM, Du Li wrote: Hi, I was loading data into a partitioned table on Spark 1.1.0 beeline-thriftserver. The table has complex data types such as mapstring, string and arraymapstring,string. The query is like ³insert overwrite table a partition (Š) select Š² and the select clause worked if run separately. However, when running the insert query, there was an error as follows. The source code of Cast.scala seems to only handle the primitive data types, which is perhaps why the MatchError was thrown. I just wonder if this is still work in progress, or I should do it differently. Thanks, Du scala.MatchError: MapType(StringType,StringType,true) (of class org.apache.spark.sql.catalyst.types.MapType) org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:2 47) org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala :84) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl y(Projection.scala:66) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl y(Projection.scala:50) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sq l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.sca la:149) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1 145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 615) java.lang.Thread.run(Thread.java:722) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: SparkSQL: map type MatchError when inserting into Hive table
Would you mind to provide the DDL of this partitioned table together with the query you tried? The stacktrace suggests that the query was trying to cast a map into something else, which is not supported in Spark SQL. And I doubt whether Hive support casting a complex type to some other type. On 9/27/14 7:48 AM, Du Li wrote: Hi, I was loading data into a partitioned table on Spark 1.1.0 beeline-thriftserver. The table has complex data types such as mapstring, string and arraymapstring,string. The query is like ³insert overwrite table a partition (Š) select Š² and the select clause worked if run separately. However, when running the insert query, there was an error as follows. The source code of Cast.scala seems to only handle the primitive data types, which is perhaps why the MatchError was thrown. I just wonder if this is still work in progress, or I should do it differently. Thanks, Du scala.MatchError: MapType(StringType,StringType,true) (of class org.apache.spark.sql.catalyst.types.MapType) org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:2 47) org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala :84) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl y(Projection.scala:66) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl y(Projection.scala:50) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sq l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.sca la:149) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1 145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 615) java.lang.Thread.run(Thread.java:722) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org