To: chen kevin
Cc: German Schiavon , fanxin ,
User
Subject: Re: Stream-static join : Refreshing subset of static data / Connection
pooling
The real question is two fold:
1) we had to do collect on each microbatch. In high velocity streams this could
result in millions of records causing memory
Hi,
you can use Debezium to capture real-timely the row-level changes in
PostgreSql, then stream them to kafka, finally etl and write the data to hbase
by flink/spark streaming。So you can join the data in hbase directly. in
consideration of the particularly big table, the scan performance
1. the issue about that Kerberos expires.
* You don’t need to care aboubt usually, you can use the local keytab at
every node in the Hadoop cluster.
* If there don’t have the keytab in your Hadoop cluster, you will need
update your keytab in every executor periodically。
2.
,
Kevin
From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>>
Date: Friday, September 16, 2016 at 3:46 AM
To: Chen Kevin <kevin.c...@neustar.biz<mailto:kevin.c...@neustar.biz>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>&q
Hi,
Has any one encountered an issue of missing output partition file in S3 ? My
spark job writes output to a S3 location. Occasionally, I noticed one partition
file is missing. As a result, one chunk of data was lost. If I rerun the same
job, the problem usually goes away. This has been
Does anyone know if I can save a RDD as a text file to a pre-created directory
in S3 bucket?
I have a directory created in S3 bucket: //nexgen-software/dev
When I tried to save a RDD as text file in this directory:
rdd.saveAsTextFile(s3n://nexgen-software/dev/output);
I got following
is the behaviour?
On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin
kevin.c...@neustar.bizmailto:kevin.c...@neustar.biz wrote:
Does anyone know if I can save a RDD as a text file to a pre-created directory
in S3 bucket?
I have a directory created in S3 bucket: //nexgen-software/dev
When I tried