Hello,
My spark streaming app that reads kafka topics and prints the DStream works
fine on my laptop, but on AWS cluster it produces no output and no errors.
Please help me debug.
I am using Spark 2.0.2 and kafka-0-10
Thanks
The following is the output of the spark streaming app...
17/01/14
I’m looking for tips on how to debug a PythonException that’s very sparse
on details. The full exception is below, but the only interesting bits
appear to be the following lines:
org.apache.spark.api.python.PythonException:
...
py4j.protocol.Py4JError: An error occurred while calling
None.org.apac
Structured Streaming has a foreach sink, where you can essentially do what
you want with your data. Its easy to create a Kafka producer, and write the
data out to kafka.
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach
On Fri, Jan 13, 2017 at 8:28 AM, K
I need to filter out outliers from a dataframe on all columns. I can
manually list all columns like:
df.filter(x=>math.abs(x.get(0).toString().toDouble-means(0))<=3*stddevs(0))
.filter(x=>math.abs(x.get(1).toString().toDouble-means(1))<=3*stddevs(1
))
...
But I want to turn it into a ge
In terms of the nullPointerException, i think it is bug. since the test
data directories might be moved already. so it failed to load the test data
to create the test tables. You may create a jira for this.
On Fri, Jan 13, 2017 at 11:44 AM, Xin Wu wrote:
> If you are using spark-shell, you have
If you are using spark-shell, you have instance "sc" as the SparkContext
initialized already. If you are writing your own application, you need to
create a SparkSession, which comes with the SparkContext. So you can
reference it like sparkSession.sparkContext.
In terms of creating a table from Dat
But it forces you to create your own SparkContext, which I’d rather not do.
Also it doesn’t seem to allow me to directly create a table from a DataFrame,
as follow:
TestHive.createDataFrame[MyType](rows).write.saveAsTable("a_table")
From: Xin Wu [mailto:xwu0...@gmail.com]
Sent: 13 janvier 2017
I used the following:
val testHive = new org.apache.spark.sql.hive.test.TestHiveContext(sc,
*false*)
val hiveClient = testHive.sessionState.metadataHive
hiveClient.runSqlHive(“….”)
On Fri, Jan 13, 2017 at 6:40 AM, Nicolas Tallineau <
nicolas.tallin...@ubisoft.com> wrote:
> I get a nullPointerE
how do you do this with structured streaming? i see no mention of writing
to kafka
On Fri, Jan 13, 2017 at 10:30 AM, Peyman Mohajerian
wrote:
> Yes, it is called Structured Streaming: https://docs.
> databricks.com/_static/notebooks/structured-streaming-kafka.html
> http://spark.apache.org/docs/
There is not automated solution right now. You have to issue manual ALTER
TABLE commands, which works for adding top-level columns but gets tricky if
you are adding a field in a deeply nested struct.
Hopefully, the issue will be fixed in 2.2 because work has started on
https://issues.apache.org/ji
Hello,
Thanks a lot Dinko.
Yes, now it is working perfectly.
Cheers,
Anahita
On Fri, Jan 13, 2017 at 2:19 PM, Dinko Srkoč wrote:
> On 13 January 2017 at 13:55, Anahita Talebi
> wrote:
> > Hi,
> >
> > Thanks for your answer.
> >
> > I have chose "Spark" in the "job type". There is not any opt
Yes, it is called Structured Streaming:
https://docs.databricks.com/_static/notebooks/structured-streaming-kafka.html
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar
wrote:
> Hi Team ,
>
> Sorry if this question
I get a nullPointerException as soon as I try to execute a TestHive.sql(...)
statement since migrating to Spark 2 because it's trying to load non existing
"test tables". I couldn't find a way to switch to false the loadTestTables
variable.
Caused by: sbt.ForkMain$ForkError: java.lang.NullPointe
On 13 January 2017 at 13:55, Anahita Talebi wrote:
> Hi,
>
> Thanks for your answer.
>
> I have chose "Spark" in the "job type". There is not any option where we can
> choose the version. How I can choose different version?
There's "Preemptible workers, bucket, network, version,
initialization, &
Hi,
Thanks for your answer.
I have chose "Spark" in the "job type". There is not any option where we
can choose the version. How I can choose different version?
Thanks,
Anahita
On Thu, Jan 12, 2017 at 6:39 PM, A Shaikh wrote:
> You may have tested this code on Spark version on your local mac
Hi, you can take a look at this project, it is a distributed HA Spark
cluster for AWS environment using Docker, we put the spark ec2
instances in an ELB, and using this code snippet to get the instance
IPs:
https://github.com/zalando-incubator/spark-appliance/blob/master/utils.py#L49-L56
Dockerfi
Hi Team ,
Sorry if this question already asked in this forum..
Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
Here is my Code which Reads Parquet File :
*val sqlContext = new org.apache.spark.sql.SQLContext(sc);*
*val df = sqlContext.read.parquet("/temp/*.parquet
17 matches
Mail list logo