Hello guys,
I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem
regarding using spark 2.4 client mode function on kubernetes cluster, so I
would like to ask if there is some solution to my problem.
The problem is when I am trying to run spark-shell on kubernetes v1.11.3
Hello,
I have the edges of a graph stored as parquet files (about 3GB). I am loading
the graph and trying to compute the total number of triplets and triangles.
Here is my code:
val edges_parq = sqlContext.read.option("header","true").parquet(args(0) +
"/year=" + year)
val edges:
Hi,
Doing cartesian multiplication against a matrix, I got the error:
pyspark.sql.utils.IllegalArgumentException: requirement failed: Number of
rows divided by rowsPerBlock cannot exceed maximum integer.
Here is the code:
normalizer = Normalizer(inputCol="feature", outputCol="norm")
data =
Thanks for bringing this issue to the mailing list.
As an addition, I would also ask the same questions about DStreams and
Structured Streaming APIs.
Structured Streaming is high level and it makes difficult to express all
business logic in it, although Databricks are pushing it and recommending
Hi,
There are some functions like map, flatMap, reduce and ..., that construct
the base data processing operation in big data (and Apache Spark). But
Spark, in new versions, introduces the high-level Dataframe API and
recommend using it. This is while there are no such functions in Dataframe
API