hadoop-2.6 is supported (look for profile XML in the pom.xml file).
For Hive, add -Phive -Phive-thriftserver (See
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables)
for more details.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http
Check out StreamingContext.queueStream (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext
)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe
Dynamic allocation doesn't work yet with Spark Streaming in any cluster
scenario. There was a previous thread on this topic which discusses the
issues that need to be resolved.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'
val array = line.split(",") // assume: "name,street,city"
User(array(0), Address(array(1), array(2)))
}.toDF()
scala> df.printSchema
root
|-- name: string (nullable = true)
|-- address: struct (nullable = true)
||-- street: string (nullable = true)
| |-- city:
-separated input data:
case class Address(street: String, city: String)
case class User (name: String, address: Address)
sc.textFile("/path/to/stuff").
map { line =>
line.split(0)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/
-jars"
option. Note that the latter may not be an ideal solution if it has other
dependencies that also need to be passed.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com&
Is myRDD outside a DStream? If so are you persisting on each batch
iteration? It should be checkpointed frequently too.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
to call collect
on toUpdate before using foreach(println). If the RDD is huge, you
definitely don't want to do that.
Hope this helps.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler
hop/InvertedIndex5b.scala>
for a taste of how concise it makes code!
4. Type inference: Spark really shows its utility. It means a lot less code
to write, but you get the hints of what you just wrote!
My $0.02.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.or
files overall.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Sun, Sep 13, 2015 at 12:5
Here's a demonstration video from @noootsab himself (creator of Spark
Notebook) showing live charting in Spark Notebook. It's one reason I prefer
it over the other options.
https://twitter.com/noootsab/status/638489244160401408
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<h
of your example,
(r: ResultSet) => (r.getInt("col1"),r.getInt("col2")...r.getInt("col37")
)
could add nested () to group elements and keep the outer number of elements
<= 22.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://sh
d routable inside the cluster. Recall that EC2 instances
have both public and private host names & IP addresses.
Also, is the port number correct for HDFS in the cluster?
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do>
, it was compiled with Java 6 (see
https://en.wikipedia.org/wiki/Java_class_file). So, it doesn't appear to be
a Spark build issue.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanw
ConcurrentHashMap.keySet() returning a KeySetView is a Java 8 method. The
Java 7 method returns a Set. Are you running Java 7? What happens if you
run Java 8?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe
More specifically, you could have TBs of data across thousands of
partitions for a single RDD. If you call collect(), BOOM!
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanw
You can call the collect() method to return a collection, but be careful.
If your data is too big to fit in the driver's memory, it will crash.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typ
.
You can look at the logic in the Spark code base, RDD.scala (first method
calls the take method) and SparkContext.scala (runJob method, which take
calls).
However, the exceptions definitely look like bugs to me. There must be some
empty partitions.
dean
Dean Wampler, Ph.D.
Author: Programming
If you mean retaining data from past jobs, try running the history server,
documented here:
http://spark.apache.org/docs/latest/monitoring.html
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typ
ClassNotFoundException usually means one of a few problems:
1. Your app assembly is missing the jar files with those classes.
2. You mixed jar files from imcompatible versions in your assembly.
3. You built with one version of Spark and deployed to another.
Dean Wampler, Ph.D.
Author
It works on Mesos, too. I'm not sure about Standalone mode.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogr
?
HTH,
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Wed, Feb 10, 2016 at 3:51 PM, Nipun
ust's talk at Spark
Summit East nicely made this point.
http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <
101 - 123 of 123 matches
Mail list logo