Re: Stream-static join : Refreshing subset of static data / Connection pooling

2020-11-29 Thread chen kevin
To: chen kevin Cc: German Schiavon , fanxin , User Subject: Re: Stream-static join : Refreshing subset of static data / Connection pooling The real question is two fold: 1) we had to do collect on each microbatch. In high velocity streams this could result in millions of records causing memory

Re: Stream-static join : Refreshing subset of static data / Connection pooling

2020-11-29 Thread chen kevin
Hi, you can use Debezium to capture real-timely the row-level changes in PostgreSql, then stream them to kafka, finally etl and write the data to hbase by flink/spark streaming。So you can join the data in hbase directly. in consideration of the particularly big table, the scan performance

Re: how to manage HBase connections in Executors of Spark Streaming ?

2020-11-25 Thread chen kevin
1. the issue about that Kerberos expires. * You don’t need to care aboubt usually, you can use the local keytab at every node in the Hadoop cluster. * If there don’t have the keytab in your Hadoop cluster, you will need update your keytab in every executor periodically。 2.

Re: Missing output partition file in S3

2016-09-19 Thread Chen, Kevin
, Kevin From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> Date: Friday, September 16, 2016 at 3:46 AM To: Chen Kevin <kevin.c...@neustar.biz<mailto:kevin.c...@neustar.biz>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>&q

Missing output partition file in S3

2016-09-15 Thread Chen, Kevin
Hi, Has any one encountered an issue of missing output partition file in S3 ? My spark job writes output to a S3 location. Occasionally, I noticed one partition file is missing. As a result, one chunk of data was lost. If I rerun the same job, the problem usually goes away. This has been

SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried to save a RDD as text file in this directory: rdd.saveAsTextFile(s3n://nexgen-software/dev/output); I got following

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
is the behaviour? On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin kevin.c...@neustar.bizmailto:kevin.c...@neustar.biz wrote: Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket? I have a directory created in S3 bucket: //nexgen-software/dev When I tried