Re: A Spark Compilation Question

2014-09-26 Thread Yanbo Liang
Hi Hansu,

I have encountered the same problem. Maven compiled avro file and generated
corresponding Java file in new directory which is not source file directory
of the project.

I have modified pom.xml file and it can be work.
The line marked as red is added, you can add them to your
spark-*.*.*/external/flume-sink/pom.xml.

plugin
groupIdorg.apache.avro/groupId
artifactIdavro-maven-plugin/artifactId
version${avro.version}/version
configuration
  !-- Generate the output in the same directory as the
sbt-avro-plugin --

outputDirectory${project.basedir}/target/scala-${scala.binary.version}/src_managed/main/compiled_avro/outputDirectory
  outputDirectory${project.basedir}/src/main/java/outputDirectory
/configuration
executions
  execution
phasegenerate-sources/phase
goals
  goalidl-protocol/goal
/goals
  /execution
/executions
  /plugin

plugin
groupIdorg.codehaus.mojo/groupId
artifactIdbuild-helper-maven-plugin/artifactId
version1.9.1/version
executions
  execution
idadd-source/id
phasegenerate-sources/phase
goals
  goaladd-source/goal
/goals
configuration
  sources
  source${project.basedir}/src/main/java/source
  /sources
/configuration
  /execution
/executions
  /plugin




2014-09-13 2:45 GMT+08:00 Hansu GU guha...@gmail.com:

 I downloaded the source and imported it into IntelliJ 13.1 as a Maven
 project.

 When I used IntelliJ Build - make Project, I encountered:

 Error:(44, 66) not found: type SparkFlumeProtocol val
 transactionTimeout: Int, val backOffInterval: Int) extends
 SparkFlumeProtocol with Logging {

 I think there are some avro generated files missing but I am not sure.
 Could anyone help me understand this in order to successfully compile
 the source?

 Thanks,
 Hansu

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
Can someone explain the motivation behind passing executorAdded event to
DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded
*method is called by *TaskSchedulerImpl*. I see some issue in the below
code,

*TaskSchedulerImpl.scala code*
if (!executorsByHost.contains(o.host)) {
executorsByHost(o.host) = new HashSet[String]()
executorAdded(o.executorId, o.host)
newExecAvail = true
  }

Note that executorAdded is called only when there is a new host and not for
every new executor. For instance, there can be two executors in the same
host and in this case. (But DAGScheduler executorAdded is notified only for
new host - so only once in this case). If this is indeed an issue, I would
like to submit a patch for this quickly. [cc Andrew Or]

- Praveen


Re: executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
Some corrections.

On Fri, Sep 26, 2014 at 5:32 PM, praveen seluka praveen.sel...@gmail.com
wrote:

 Can someone explain the motivation behind passing executorAdded event to
 DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded
 *method is called by *TaskSchedulerImpl*. I see some issue in the below
 code,

 *TaskSchedulerImpl.scala code*
 if (!executorsByHost.contains(o.host)) {
 executorsByHost(o.host) = new HashSet[String]()
 executorAdded(o.executorId, o.host)
 newExecAvail = true
   }

 Note that executorAdded is called only when there is a new host and not
 for every new executor. For instance, there can be two executors in the
 same host and in this case the DAGscheduler is notified only once. If this
 is indeed an issue, I would like to submit a patch for this quickly. [cc
 Andrew Or]

 - Praveen





Re: executorAdded event to DAGScheduler

2014-09-26 Thread Nan Zhu
just a quick reply, we cannot start two executors in the same host for a single 
application in the standard deployment (one worker per machine)  

I’m not sure if it will create an issue when you have multiple workers in the 
same host, as submitWaitingStages is called everywhere and I never try such a 
deployment mode

Best,  

--  
Nan Zhu


On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote:

 Can someone explain the motivation behind passing executorAdded event to 
 DAGScheduler ? DAGScheduler does submitWaitingStages when executorAdded 
 method is called by TaskSchedulerImpl. I see some issue in the below code,
  
 TaskSchedulerImpl.scala code
 if (!executorsByHost.contains(o.host)) {
 executorsByHost(o.host) = new HashSet[String]()
 executorAdded(o.executorId, o.host)
 newExecAvail = true
   }
  
  
 Note that executorAdded is called only when there is a new host and not for 
 every new executor. For instance, there can be two executors in the same host 
 and in this case. (But DAGScheduler executorAdded is notified only for new 
 host - so only once in this case). If this is indeed an issue, I would like 
 to submit a patch for this quickly. [cc Andrew Or]
  
 - Praveen
  
  



Re: executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
In Yarn, we can easily  have multiple containers allocated in the same node.

On Fri, Sep 26, 2014 at 6:05 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  just a quick reply, we cannot start two executors in the same host for a
 single application in the standard deployment (one worker per machine)

 I’m not sure if it will create an issue when you have multiple workers in
 the same host, as submitWaitingStages is called everywhere and I never
 try such a deployment mode

 Best,

 --
 Nan Zhu

 On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote:

 Can someone explain the motivation behind passing executorAdded event to
 DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded
 *method is called by *TaskSchedulerImpl*. I see some issue in the below
 code,

 *TaskSchedulerImpl.scala code*
 if (!executorsByHost.contains(o.host)) {
 executorsByHost(o.host) = new HashSet[String]()
 executorAdded(o.executorId, o.host)
 newExecAvail = true
   }

 Note that executorAdded is called only when there is a new host and not
 for every new executor. For instance, there can be two executors in the
 same host and in this case. (But DAGScheduler executorAdded is notified
 only for new host - so only once in this case). If this is indeed an issue,
 I would like to submit a patch for this quickly. [cc Andrew Or]

 - Praveen






Re: PARSING_ERROR from kryo

2014-09-26 Thread Arun Ahuja
I am seeing the same error as well since upgrading to Spark1.1:

14/09/26 15:35:05 ERROR executor.Executor: Exception in task 1032.0 in
stage 5.1 (TID 22449)
com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to
uncompress the chunk: PARSING_ERROR(2)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
at com.esotericsoftware.kryo.io.Input.require(Input.java:155)
at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
at
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1082)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

Out of 6000 tasks 5000 something finish fine, so I don't believe there are
any issues with the serialization and some other datasets everything works
fine. Also the same code, same dataset worked fine with Spark 1.0.2

On Mon, Sep 15, 2014 at 9:57 PM, npanj nitinp...@gmail.com wrote:

 Hi Andrew,

 No I could not figure out the root cause. This seems to be
 non-deterministic
 error... I didn't see same error after rerunning same program. But I
 noticed
 same error on a different program.

 First I thought that this may be related to SPARK-2878, but @Graham replied
 that this looks irrelevant.






 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/PARSING-ERROR-from-kryo-tp7944p8433.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




FYI: jenkins systems patched to fix bash exploit

2014-09-26 Thread shane knapp
all of our systems were affected by the shellshock bug, and i've just
patched everything w/the latest fix from redhat:

https://access.redhat.com/articles/1200223

we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems.

shane


Re: FYI: jenkins systems patched to fix bash exploit

2014-09-26 Thread shane knapp


 we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems.

 s/not/now

:)


thank you for reviewing our patches

2014-09-26 Thread Nicholas Chammas
I recently came across this mailing list post by Linus Torvalds
https://lkml.org/lkml/2004/12/20/255 about the value of reviewing even
“trivial” patches. The following passages stood out to me:

I think that much more important than the patch is the fact that people get
used to the notion that they can change the kernel

…

So please don’t stop. Yes, those trivial patches *are* a bother. Damn, they
are *horrible*. But at the same time, the devil is in the detail, and they
are needed in the long run. Both the patches themselves, and the people
that grew up on them.

Spark is the first (and currently only) open source project I contribute
regularly to. My first several PRs against the project, as simple as they
were, were definitely patches that I “grew up on”.

I appreciate the time and effort all the reviewers I’ve interacted with
have taken to work with me on my PRs, even when they are “trivial”. And I’m
sure that as I continue to contribute to this project there will be many
more patches that I will “grow up on”.

Thank you Patrick, Reynold, Josh, Davies, Michael, and everyone else who’s
taken time to review one of my patches. I appreciate it!

Nick
​


Re: thank you for reviewing our patches

2014-09-26 Thread Reynold Xin
Keep the patches coming :)


On Fri, Sep 26, 2014 at 1:50 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 I recently came across this mailing list post by Linus Torvalds
 https://lkml.org/lkml/2004/12/20/255 about the value of reviewing even
 “trivial” patches. The following passages stood out to me:

 I think that much more important than the patch is the fact that people get
 used to the notion that they can change the kernel

 …

 So please don’t stop. Yes, those trivial patches *are* a bother. Damn, they
 are *horrible*. But at the same time, the devil is in the detail, and they
 are needed in the long run. Both the patches themselves, and the people
 that grew up on them.

 Spark is the first (and currently only) open source project I contribute
 regularly to. My first several PRs against the project, as simple as they
 were, were definitely patches that I “grew up on”.

 I appreciate the time and effort all the reviewers I’ve interacted with
 have taken to work with me on my PRs, even when they are “trivial”. And I’m
 sure that as I continue to contribute to this project there will be many
 more patches that I will “grow up on”.

 Thank you Patrick, Reynold, Josh, Davies, Michael, and everyone else who’s
 taken time to review one of my patches. I appreciate it!

 Nick
 ​



Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-26 Thread Cheng Lian
Would you mind to provide the DDL of this partitioned table together 
with the query you tried? The stacktrace suggests that the query was 
trying to cast a map into something else, which is not supported in 
Spark SQL. And I doubt whether Hive support casting a complex type to 
some other type.


On 9/27/14 7:48 AM, Du Li wrote:

Hi,

I was loading data into a partitioned table on Spark 1.1.0
beeline-thriftserver. The table has complex data types such as mapstring,
string and arraymapstring,string. The query is like ³insert overwrite
table a partition (Š) select Š² and the select clause worked if run
separately. However, when running the insert query, there was an error as
follows.

The source code of Cast.scala seems to only handle the primitive data
types, which is perhaps why the MatchError was thrown.

I just wonder if this is still work in progress, or I should do it
differently.

Thanks,
Du



scala.MatchError: MapType(StringType,StringType,true) (of class
org.apache.spark.sql.catalyst.types.MapType)
 
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:2

47)
 org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
 org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)
 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala

:84)
 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl

y(Projection.scala:66)
 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl

y(Projection.scala:50)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sq

l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.sca
la:149)
 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive

File$1.apply(InsertIntoHiveTable.scala:158)
 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive

File$1.apply(InsertIntoHiveTable.scala:158)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1

145)
 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:

615)
 java.lang.Thread.run(Thread.java:722)






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-26 Thread Cheng Lian
Would you mind to provide the DDL of this partitioned table together 
with the query you tried? The stacktrace suggests that the query was 
trying to cast a map into something else, which is not supported in 
Spark SQL. And I doubt whether Hive support casting a complex type to 
some other type.


On 9/27/14 7:48 AM, Du Li wrote:

Hi,

I was loading data into a partitioned table on Spark 1.1.0
beeline-thriftserver. The table has complex data types such as mapstring,
string and arraymapstring,string. The query is like ³insert overwrite
table a partition (Š) select Š² and the select clause worked if run
separately. However, when running the insert query, there was an error as
follows.

The source code of Cast.scala seems to only handle the primitive data
types, which is perhaps why the MatchError was thrown.

I just wonder if this is still work in progress, or I should do it
differently.

Thanks,
Du



scala.MatchError: MapType(StringType,StringType,true) (of class
org.apache.spark.sql.catalyst.types.MapType)

org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:2
47)
 org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
 org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)

org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala
:84)

org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl
y(Projection.scala:66)

org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.appl
y(Projection.scala:50)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sq
l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.sca
la:149)

org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive
File$1.apply(InsertIntoHiveTable.scala:158)

org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHive
File$1.apply(InsertIntoHiveTable.scala:158)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
615)
 java.lang.Thread.run(Thread.java:722)






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org