?????? is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread witgo
There's a related discussion 
https://issues.apache.org/jira/browse/SPARK-2815




--  --
??: Chester Chenches...@alpinenow.com; 
: 2014??8??21??(??) 7:42
??: devdev@spark.apache.org; 
: Re: is Branch-1.1 SBT build broken for yarn-alpha ?



Just tried on master branch, and the master branch works fine for yarn-alpha


On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen ches...@alpinenow.com wrote:

 I just updated today's build and tried branch-1.1 for both yarn and
 yarn-alpha.

 For yarn build, this command seem to work fine.

 sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects

 for yarn-alpha

 sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects

 I got the following

 Any ideas


 Chester

 ?4?1 |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
 projects*

 Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
 as default JAVA_HOME.

 Note, this will be overridden by -java-home if it is set.

 [info] Loading project definition from
 /Users/chester/projects/spark/project/project

 [info] Loading project definition from
 /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project

 [warn] Multiple resolvers having different access mechanism configured
 with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
 project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).

 [info] Loading project definition from
 /Users/chester/projects/spark/project

 org.apache.maven.model.building.ModelBuildingException: 1 problem was
 encountered while building the effective model for
 org.apache.spark:spark-yarn-alpha_2.10:1.1.0

 *[FATAL] Non-resolvable parent POM: Could not find artifact
 org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
 http://repo.maven.apache.org/maven2 http://repo.maven.apache.org/maven2)
 and 'parent.relativePath' points at wrong local POM @ line 20, column 11*


  at
 org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)

 at
 org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)

 at
 org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)

 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)

 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)

 at
 com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)

 at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.AbstractTraversable.map(Traversable.scala:105)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.AbstractTraversable.map(Traversable.scala:105)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)

 at
 com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)

 at SparkBuild$.projectDefinitions(SparkBuild.scala:165)

 at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)

 at sbt.Load$$anonfun$24.apply(Load.scala:415)

 at sbt.Load$$anonfun$24.apply(Load.scala:415)

 at scala.collection.immutable.Stream.flatMap(Stream.scala:442)

 at sbt.Load$.loadUnit(Load.scala:415)

 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

 at
 

Hang on Executor classloader lookup for the remote REPL URL classloader

2014-08-21 Thread Andrew Ash
Hi Spark devs,

I'm seeing a stacktrace where the classloader that reads from the REPL is
hung, and blocking all progress on that executor.  Below is that hung
thread's stacktrace, and also the stacktrace of another hung thread.

I thought maybe there was an issue with the REPL's JVM on the other side,
but didn't see anything useful in that stacktrace either.

Any ideas what I should be looking for?

Thanks!
Andrew


Executor task launch worker-0 daemon prio=10 tid=0x7f780c208000
nid=0x6ae9 runnable [0x7f78c2eeb000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0x7f7e13ea9560 (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
- locked 0x7f7e13e9eeb0 (a
sun.net.www.protocol.http.HttpURLConnection)
at java.net.URL.openStream(URL.java:1037)
at
org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:86)
at
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:63)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
- locked 0x7f7fc9018980 (a
org.apache.spark.repl.ExecutorClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:102)
at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:82)
at
org.apache.avro.specific.SpecificData.getClass(SpecificData.java:132)
at
org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:69)
at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:126)
at
org.apache.avro.file.DataFileReader.init(DataFileReader.java:97)
at
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:59)
at
org.apache.avro.mapred.AvroRecordReader.init(AvroRecordReader.java:41)
at
org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:193)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)


And the other threads are stuck on the Class.forName0() method too:

Executor task launch worker-4 daemon prio=10 tid=0x7f780c20f000
nid=0x6aed waiting for monitor entry [0x7f78c2ae8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:102)
at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:79)
at
org.apache.avro.specific.SpecificData.getClass(SpecificData.java:132)
at
org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:69)
at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:126)
at
org.apache.avro.file.DataFileReader.init(DataFileReader.java:97)
at
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:59)
at
org.apache.avro.mapred.AvroRecordReader.init(AvroRecordReader.java:41)
at
org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:193)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

asdf


[SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
Hi All,

I've packaged and published a snapshot release of Spark 1.1 for testing.
This is very close to RC1 and we are distributing it for testing. Please
test this and report any issues on this thread.

The tag of this release is v1.1.0-snapshot1 (commit e1535ad3):
*https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=e1535ad3c6f7400f2b7915ea91da9c60510557ba
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=e1535ad3c6f7400f2b7915ea91da9c60510557ba*

The release files, including signatures, digests, etc can be found at:
*http://people.apache.org/~pwendell/spark-1.1.0-snapshot2/
http://people.apache.org/~pwendell/spark-1.1.0-snapshot1/*

Release artifacts are signed with the following key:
*https://people.apache.org/keys/committer/pwendell.asc
https://people.apache.org/keys/committer/pwendell.asc*

The staging repository for this release can be found at:
[NOTE: Apache Sonatype is down preventing us from cutting this]
https://repository.apache.org/content/repositories/orgapachespark-1026/
https://repository.apache.org/content/repositories/orgapachespark-1024/


To learn more about Apache Spark, please see
http://spark.apache.org/


Re: [SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
The docs for this release are also available here:

http://people.apache.org/~pwendell/spark-1.1.0-snapshot2-docs/


On Thu, Aug 21, 2014 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All,

 I've packaged and published a snapshot release of Spark 1.1 for testing.
 This is very close to RC1 and we are distributing it for testing. Please
 test this and report any issues on this thread.

 The tag of this release is v1.1.0-snapshot1 (commit e1535ad3):
 *https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=e1535ad3c6f7400f2b7915ea91da9c60510557ba
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=e1535ad3c6f7400f2b7915ea91da9c60510557ba*

 The release files, including signatures, digests, etc can be found at:
 *http://people.apache.org/~pwendell/spark-1.1.0-snapshot2/
 http://people.apache.org/~pwendell/spark-1.1.0-snapshot1/*

 Release artifacts are signed with the following key:
 *https://people.apache.org/keys/committer/pwendell.asc
 https://people.apache.org/keys/committer/pwendell.asc*

 The staging repository for this release can be found at:
 [NOTE: Apache Sonatype is down preventing us from cutting this]
 https://repository.apache.org/content/repositories/orgapachespark-1026/
 https://repository.apache.org/content/repositories/orgapachespark-1024/


 To learn more about Apache Spark, please see
 http://spark.apache.org/



Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Sean Owen
Maven is just telling you that there is no version 1.1.0 of
yarn-parent, and indeed, it has not been released. To build the branch
you would need to mvn install to compile and make available local
copies of artifacts along the way. (You may have these for
1.1.0-SNAPSHOT locally already). Use Maven, not SBT, for building
releases.

On Thu, Aug 21, 2014 at 12:39 AM, Chester Chen ches...@alpinenow.com wrote:
 *[FATAL] Non-resolvable parent POM: Could not find artifact
 org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
 http://repo.maven.apache.org/maven2 http://repo.maven.apache.org/maven2)
 and 'parent.relativePath' points at wrong local POM @ line 20, column 11*

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Spark Contribution

2014-08-21 Thread Maisnam Ns
Hi,

Can someone help me with some links on how to contribute for Spark

Regards
mns


Kinesis streaming integration in upcoming 1.1

2014-08-21 Thread Aniket Bhatnagar
Hi everyone

I started looking at Kinesis integration and it looks promising.  However,
I feel like it can be improved. Here are my thoughts:

1. It assumes that AWS credentials are provided
by DefaultAWSCredentialsProviderChain and there is no way to change the
behavior. I would have liked to have an ability to provide a different
AWSCredentialsProvider.

2. I feel like modules in extras need to be independent from Spark build
and should perhaps be in separate repository/repositories. I had to
download most recent checkout of Spark and slap kinesis-asl into Spark
1.0.2 to create a custom spark-streaming-kinesis-asl_2.10-1.0.2.jar that I
can use in my Spark jobs. Ideally, people would want extra modules to be
cross built against different versions of Spark. Having independent
repositories can enable us to deliver build for extras packages faster than
Spark releases and they would be readily available to earlier versions of
Spark. This can free up Spark developers to focus on enhancements in the
core framework instead of managing spark-* integration pull requests.

3. Maybe it's just me, but I could have liked a Context like API for
creating Kinesis streams instead of using KinesisUtils. It makes it a
little more consistent with rest of the Spark API. We could have have
a KinesisContext which goes like this:
class KinesisStreamingContext(@transient ssc: StreamingContext,
endpointUrl: String, defaultCredentialsProvider: AWSCredentialsProvider) {

  def createStream(streamName: String,
  checkpointInterval: Duration,
  initialPositionInStream: InitialPositionInStream,
  storageLevel: StorageLevel,
  credentialsProvider: AWSCredentialsProvider =
defaultCredentialsProvider) {...}
}

4. The example KinesisWordCountASL creates numShards receiver instances
which makes sense. Maybe the API should provide ability to provide
parallelism and default to numShards?

I can submit pull requests for some of the above items, provided the
community agrees and nobody else is working on it.

Thanks,
Aniket


Re: Spark SQL Query and join different data sources.

2014-08-21 Thread chutium
as far as i know, HQL queries try to find the schema info of all the tables
in this query from hive metastore, so it is not possible to join tables from
sqlContext using hiveContext.hql

but this should work:

hiveContext.hql(select ...).regAsTable(a)
sqlContext.jsonFile(xxx).regAsTable(b)

then

sqlContext.sql( a join b )


i created a ticket SPARK-2710 to add ResultSets from JDBC connection as a
new data source, but no predicate push down yet, also, it is not available
for HQL

so, if you are looking for something that can query different data sources
with full SQL92 syntax, facebook presto is still the only choice, they have
some kind of JDBC connector in deveopment, and there are some unofficial
implementations...

but i am looking forward to seeing the progress of Spark SQL, after
SPARK-2179 SQLContext can handle any
kind of structured data with a sequence of DataTypes as schema, although
turning the data into Rows is still a little bit tricky...



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Query-and-join-different-data-sources-tp7914p7937.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Mridul Muralidharan
Weird that Patrick did not face this while creating the RC.
Essentially the yarn alpha pom.xml has not been updated properly in
the 1.1 branch.

Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
make it same as any other pom).


Regards,
Mridul


On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen ches...@alpinenow.com wrote:
 I just updated today's build and tried branch-1.1 for both yarn and
 yarn-alpha.

 For yarn build, this command seem to work fine.

 sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects

 for yarn-alpha

 sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects

 I got the following

 Any ideas


 Chester

 ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
 projects*

 Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
 as default JAVA_HOME.

 Note, this will be overridden by -java-home if it is set.

 [info] Loading project definition from
 /Users/chester/projects/spark/project/project

 [info] Loading project definition from
 /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project

 [warn] Multiple resolvers having different access mechanism configured with
 same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
 project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).

 [info] Loading project definition from /Users/chester/projects/spark/project

 org.apache.maven.model.building.ModelBuildingException: 1 problem was
 encountered while building the effective model for
 org.apache.spark:spark-yarn-alpha_2.10:1.1.0

 *[FATAL] Non-resolvable parent POM: Could not find artifact
 org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
 http://repo.maven.apache.org/maven2 http://repo.maven.apache.org/maven2)
 and 'parent.relativePath' points at wrong local POM @ line 20, column 11*


  at
 org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)

 at
 org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)

 at
 org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)

 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)

 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)

 at
 com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)

 at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.AbstractTraversable.map(Traversable.scala:105)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.AbstractTraversable.map(Traversable.scala:105)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)

 at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)

 at SparkBuild$.projectDefinitions(SparkBuild.scala:165)

 at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)

 at sbt.Load$$anonfun$24.apply(Load.scala:415)

 at sbt.Load$$anonfun$24.apply(Load.scala:415)

 at scala.collection.immutable.Stream.flatMap(Stream.scala:442)

 at sbt.Load$.loadUnit(Load.scala:415)

 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

 at
 sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)

 at
 

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester @work
Do we have Jenkins tests these ? Should be pretty easy to setup just to test 
basic build

Sent from my iPhone

 On Aug 21, 2014, at 6:45 AM, Mridul Muralidharan mri...@gmail.com wrote:
 
 Weird that Patrick did not face this while creating the RC.
 Essentially the yarn alpha pom.xml has not been updated properly in
 the 1.1 branch.
 
 Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
 make it same as any other pom).
 
 
 Regards,
 Mridul
 
 
 On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen ches...@alpinenow.com wrote:
 I just updated today's build and tried branch-1.1 for both yarn and
 yarn-alpha.
 
 For yarn build, this command seem to work fine.
 
 sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
 
 for yarn-alpha
 
 sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
 
 I got the following
 
 Any ideas
 
 
 Chester
 
 ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
 projects*
 
 Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
 as default JAVA_HOME.
 
 Note, this will be overridden by -java-home if it is set.
 
 [info] Loading project definition from
 /Users/chester/projects/spark/project/project
 
 [info] Loading project definition from
 /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
 
 [warn] Multiple resolvers having different access mechanism configured with
 same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
 project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
 
 [info] Loading project definition from /Users/chester/projects/spark/project
 
 org.apache.maven.model.building.ModelBuildingException: 1 problem was
 encountered while building the effective model for
 org.apache.spark:spark-yarn-alpha_2.10:1.1.0
 
 *[FATAL] Non-resolvable parent POM: Could not find artifact
 org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
 http://repo.maven.apache.org/maven2 http://repo.maven.apache.org/maven2)
 and 'parent.relativePath' points at wrong local POM @ line 20, column 11*
 
 
 at
 org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
 
 at
 org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
 
 at
 org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
 
 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
 
 at
 org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
 
 at
 com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
 
 at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
 
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
 
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
 
 at
 com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
 
 at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
 
 at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
 
 at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
 
 at sbt.Load$$anonfun$24.apply(Load.scala:415)
 
 at sbt.Load$$anonfun$24.apply(Load.scala:415)
 
 at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
 
 at sbt.Load$.loadUnit(Load.scala:415)
 
 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
 
 at 

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread Yan Zhou.sc
I doubt it will work as expected.

Note that hiveContext.hql(select ...).regAsTable(a) will create a SchemaRDD 
before register the SchemaRDD with the (Hive) catalog;
While sqlContext.jsonFile(xxx).regAsTable(b) will create a SchemaRDD before 
register the SchemaRDD with the SparkSQL catalog(SimpleCatalog).
The logic plans of the two SchemaRDDs are of the same type; but the physical 
plans are, and should be, different.
The issue is that the transformation of the logical plans to physical plans are 
controlled by the strategies of contexts; namely the sqlContext
transforms a logical plan to a physical plan suitable for SchemaRDD's execution 
from an in-memory data source, while HiveContext 
transforms a logical plan to a physical plan suitable for SchemaRDD's execution 
from a Hive data source. So

sqlContext.sql( a join b ) will generate a physical plan for the in-memory data 
source for both a and b; and
hiveContext.sql(a join b) will generate a physical plan for Hive data source 
for both a and b.

What's really needed is a storage transparency from the semantic layer if 
SparkSQL wants to go the data federation route.


If one could manage to create a SchemaRDD on Hive data through just the 
SQLContext, not the HiveCOntext (being a subclass of SQLCOntext), seemingly
hinted by the SparkSQL web page https://spark.apache.org/sql/ in the following 
code snippet:

sqlCtx.jsonFile(s3n://...)
  .registerAsTable(json)
 schema_rdd = sqlCtx.sql(
   SELECT * 
   FROM hiveTable
   JOIN json ...)

he/she might be able to perform the join of data sets of different types. I 
just have not tried.


In terms of SQL-92 conforming, Presto might be better than HiveQL; while in 
terms of federation, Hive is actually very good at it.




-Original Message-
From: chutium [mailto:teng@gmail.com] 
Sent: Thursday, August 21, 2014 4:35 AM
To: d...@spark.incubator.apache.org
Subject: Re: Spark SQL Query and join different data sources.

as far as i know, HQL queries try to find the schema info of all the tables in 
this query from hive metastore, so it is not possible to join tables from 
sqlContext using hiveContext.hql

but this should work:

hiveContext.hql(select ...).regAsTable(a)
sqlContext.jsonFile(xxx).regAsTable(b)

then

sqlContext.sql( a join b )


i created a ticket SPARK-2710 to add ResultSets from JDBC connection as a new 
data source, but no predicate push down yet, also, it is not available for HQL

so, if you are looking for something that can query different data sources with 
full SQL92 syntax, facebook presto is still the only choice, they have some 
kind of JDBC connector in deveopment, and there are some unofficial 
implementations...

but i am looking forward to seeing the progress of Spark SQL, after
SPARK-2179 SQLContext can handle any
kind of structured data with a sequence of DataTypes as schema, although 
turning the data into Rows is still a little bit tricky...



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Query-and-join-different-data-sources-tp7914p7937.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark Contribution

2014-08-21 Thread Henry Saputra
The Apache Spark wiki on how to contribute should be great place to
start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

- Henry

On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns maisnam...@gmail.com wrote:
 Hi,

 Can someone help me with some links on how to contribute for Spark

 Regards
 mns

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



PARSING_ERROR from kryo

2014-08-21 Thread npanj
Hi All,

I am getting PARSING_ERROR while running my job on the code checked out up
to commit# db56f2df1b8027171da1b8d2571d1f2ef1e103b6. I am running this job
on EC2.

Any idea if there is something wrong with my config?

Here is my config: 
--
.set(spark.executor.extraJavaOptions, -XX:+UseCompressedOops
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps)
  .set(spark.storage.memoryFraction, 0.2)
  .set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
  .set(spark.kryo.registrator,
org.apache.spark.graphx.GraphKryoRegistrator)
  .set(spark.akka.frameSize, 20)
  .set(spark.akka.timeout, 300)
  .set(spark.shuffle.memoryFraction, 0.5)
  .set(spark.core.connection.ack.wait.timeout, 1800)
--



--
Job aborted due to stage failure: Task 947 in stage 11.0 failed 4 times,
most recent failure: Lost task 947.3 in stage 11.0 (TID 12750,
ip-10-167-149-118.ec2.internal): com.esotericsoftware.kryo.KryoException:
java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2)
Serialization trace:
vids (org.apache.spark.graphx.impl.VertexAttributeBlock)
com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
com.esotericsoftware.kryo.io.Input.require(Input.java:169)
com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:719)
com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
   
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:127)
   
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:107)
com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
   
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
   
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
   
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:119)
   
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:129)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
   
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1038)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
   
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
   
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   
org.apache.spark.graphx.impl.VertexPartitionBaseOps.innerJoinKeepLeft(VertexPartitionBaseOps.scala:192)
   
org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:78)
   
org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
   
org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57)
   
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147)
   
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
org.apache.spark.scheduler.Task.run(Task.scala:51)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
--



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/PARSING-ERROR-from-kryo-tp7944.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark Contribution

2014-08-21 Thread Nicholas Chammas
We should add this link to the readme on GitHub btw.

2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com님이 작성한 메시지:

 The Apache Spark wiki on how to contribute should be great place to
 start:
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 - Henry

 On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns maisnam...@gmail.com
 javascript:; wrote:
  Hi,
 
  Can someone help me with some links on how to contribute for Spark
 
  Regards
  mns

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:;
 For additional commands, e-mail: dev-h...@spark.apache.org javascript:;




Re: Lost executor on YARN ALS iterations

2014-08-21 Thread Debasish Das
Sandy,

I put spark.yarn.executor.memoryOverhead 1024 on spark-defaults.conf but I
don't see environment variable on spark properties on the webui-environment

Does it need to go in spark-env.sh ?

Thanks.
Deb


On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:

 Hi Debasish,

 The fix is to raise spark.yarn.executor.memoryOverhead until this goes
 away.  This controls the buffer between the JVM heap size and the amount of
 memory requested from YARN (JVMs can take up memory beyond their heap
 size). You should also make sure that, in the YARN NodeManager
 configuration, yarn.nodemanager.vmem-check-enabled is set to false.

 -Sandy


 On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das debasish.da...@gmail.com
 wrote:

 I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
 definitely a YARN related problem...

 At least for me right now only deployment option possible is standalone...



 On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote:

 Hi Deb,

 I think this may be the same issue as described in
 https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
 container got killed by YARN because it used much more memory that it
 requested. But we haven't figured out the root cause yet.

 +Sandy

 Best,
 Xiangrui

 On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi,
 
  During the 4th ALS iteration, I am noticing that one of the executor
 gets
  disconnected:
 
  14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
  SendingConnectionManagerId not found
 
  14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
  disconnected, so removing it
 
  14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
 executor 5
  on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated
 
  14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
 12)
  Any idea if this is a bug related to akka on YARN ?
 
  I am using master
 
  Thanks.
  Deb






saveAsTextFile makes no progress without caching RDD

2014-08-21 Thread jerryye
Hi, 
Cross-posting this from users list.

I'm running on branch-1.1 and trying to do a simple transformation to a
relatively small dataset of 64GB and saveAsTextFile essentially hangs and
tasks are stuck in running mode with the following code: 

// Stalls with tasks running for over an hour with no tasks finishing.
Smallest partition is 10MB 
val data = sc.textFile(s3n://input) 
val reformatted = data.map(t =
t.replace(Test(,).replace(),).replaceAll(,,\t)) 
reformatted.saveAsTextFile(s3n://transformed) 

// This runs but stalls doing GC after filling up 150% of 650GB of memory 
val data = sc.textFile(s3n://input) 
val reformatted = data.map(t =
t.replace(Test(,).replace(),).replaceAll(,,\t)).cache 
reformatted.saveAsTextFile(s3n://transformed) 

Any idea if this is a parameter issue and there is something I should try
out? 

Thanks! 

- jerry 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-21 Thread jerryye
bump.

I'm seeing the same issue with branch-1.1. Caching the RDD before running
saveAsTextFile gets things running but the job stalls 2/3 of the way by
using too much memory.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7950.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Storage Handlers in Spark SQL

2014-08-21 Thread Niranda Perera
Hi,

I have been playing around with Spark for the past few days, and evaluating
the possibility of migrating into Spark (Spark SQL) from Hive/Hadoop.

I am working on the WSO2 Business Activity Monitor (WSO2 BAM,
https://docs.wso2.com/display/BAM241/WSO2+Business+Activity+Monitor+Documentation
) which has currently employed Hive. We are considering Spark as a
successor for Hive, given it's performance enhancement.

We have currently employed several custom storage-handlers in Hive.
Example:
WSO2 JDBC and Cassandra storage handlers:
https://docs.wso2.com/display/BAM241/JDBC+Storage+Handler+for+Hive
https://docs.wso2.com/display/BAM241/Creating+Hive+Queries+to+Analyze+Data#CreatingHiveQueriestoAnalyzeData-cas

I would like to know where Spark SQL can work with these storage
handlers (while using HiveContext may be) ?

Best regards
-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44


RE: Spark SQL Query and join different data sources.

2014-08-21 Thread alexliu68
Presto is so far good at joining different sources/databases.

I tried a simple join query in Spark SQL, it fails as the followings errors

val a = cql(select test.a  from test JOIN test1 on test.a = test1.a)
a: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[0] at RDD at SchemaRDD.scala:104
== Query Plan ==
Project [a#7]
 Filter (a#7 = a#21)
  CartesianProduct 

org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:0 failed 4 times, most recent failure: Exception failure in TID 3 on
host 127.0.0.1:
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: No function
to evaluate expression. type: AttributeReference, tree: a#7
   
org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:158)
   
org.apache.spark.sql.catalyst.expressions.EqualTo.eval(predicates.scala:146)
   
org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$1.apply(basicOperators.scala:54)
   
org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$1.apply(basicOperators.scala:54)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)
   
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:731)
org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:731)
   
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
   
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)

It looks like Spark SQL has long way to go to be compatible with SQL




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Query-and-join-different-data-sources-tp7914p7945.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org