Re: [DISCUSS] KIP-138: Change punctuate semantics

2017-04-07 Thread Tianji Li
Hi Arun,

Could you share your code?

Thanks
Tianji

On Fri, Apr 7, 2017 at 4:21 AM, Arun Mathew <amat...@yahoo-corp.jp> wrote:

> Thanks Michal. I don’t have the access yet [arunmathew88]. Should I be
> sending a separate mail for this?
>
> I thought one of the person following this thread would be able to give me
> access.
>
>
>
> *From: *Michal Borowiecki <michal.borowie...@openbet.com>
> *Reply-To: *"dev@kafka.apache.org" <dev@kafka.apache.org>
> *Date: *Friday, April 7, 2017 at 17:16
> *To: *"dev@kafka.apache.org" <dev@kafka.apache.org>
> *Subject: *Re: [DISCUSS] KIP-138: Change punctuate semantics
>
>
>
> Hi Arun,
>
> I was thinking along the same lines as you, listing the use cases on the
> wiki, but didn't find time to get around doing that yet.
> Don't mind if you do it if you have access now.
> I was thinking it would be nice if, once we have the use cases listed,
> people could use likes to up-vote the use cases similar to what they're
> working on.
>
> I should have a bit more time to action this in the next few days, but
> happy for you to do it if you can beat me to it ;-)
>
> Cheers,
> Michal
>
> On 07/04/17 04:39, Arun Mathew wrote:
>
> Sure, Thanks Matthias. My id is [arunmathew88].
>
>
>
> Of course. I was thinking of a subpage where people can collaborate.
>
>
>
> Will do as per Michael’s suggestion.
>
>
>
> Regards,
>
> Arun Mathew
>
>
>
> On 4/7/17, 12:30, "Matthias J. Sax" <matth...@confluent.io> 
> <matth...@confluent.io> wrote:
>
>
>
> Please share your Wiki-ID and a committer can give you write access.
>
>
>
> Btw: as you did not initiate the KIP, you should not change the KIP
>
> without the permission of the original author -- in this case Michael.
>
>
>
> So you might also just share your thought over the mailing list and
>
> Michael can update the KIP page. Or, as an alternative, just create a
>
> subpage for the KIP page.
>
>
>
> @Michael: WDYT?
>
>
>
>
>
> -Matthias
>
>
>
>
>
> On 4/6/17 8:05 PM, Arun Mathew wrote:
>
> > Hi Jay,
>
> >   Thanks for the advise, I would like to list down the use 
> cases as
>
> > per your suggestion. But it seems I don't have write permission to the
>
> > Apache Kafka Confluent Space. Whom shall I request for it?
>
> >
>
> > Regarding your last question. We are using a patch in our production 
> system
>
> > which does exactly this.
>
> > We window by the event time, but trigger punctuate in  interval>
>
> > duration of system time, in the absence of an event crossing the 
> punctuate
>
> > event time.
>
> >
>
> > We are using Kafka Stream for our Audit Trail, where we need to output 
> the
>
> > event counts on each topic on each cluster aggregated over a 1 minute
>
> > window. We have to use event time to be able to cross check the counts. 
> But
>
> > we need to trigger punctuate [aggregate event pushes] by system time in 
> the
>
> > absence of events. Otherwise the event counts for unexpired windows 
> would
>
> > be 0 which is bad.
>
> >
>
> > "Maybe a hybrid solution works: I window by event time but trigger 
> results
>
> > by system time for windows that have updated? Not really sure the 
> details
>
> > of making that work. Does that work? Are there concrete examples where 
> you
>
> > actually want the current behavior?"
>
> >
>
> > --
>
> > With Regards,
>
> >
>
> > Arun Mathew
>
> > Yahoo! JAPAN Corporation
>
> >
>
> > On Wed, Apr 5, 2017 at 8:48 PM, Tianji Li <skyah...@gmail.com> 
> <skyah...@gmail.com> wrote:
>
> >
>
> >> Hi Jay,
>
> >>
>
> >> The hybrid solution is exactly what I expect and need for our use cases
>
> >> when dealing with telecom data.
>
> >>
>
> >> Thanks
>
> >> Tianji
>
> >>
>
> >> On Wed, Apr 5, 2017 at 12:01 AM, Jay Kreps <j...@confluent.io> 
> <j...@confluent.io> wrote:
>
> >>
>
> >>> Hey guys,
>
> >>>
>
> >>> One thing I've always found super important for this kind of design 
> work
>
> >> is
>
> >>> to do a r

Re: [DISCUSS] KIP-138: Change punctuate semantics

2017-04-05 Thread Tianji Li
Hi Jay,

The hybrid solution is exactly what I expect and need for our use cases
when dealing with telecom data.

Thanks
Tianji

On Wed, Apr 5, 2017 at 12:01 AM, Jay Kreps  wrote:

> Hey guys,
>
> One thing I've always found super important for this kind of design work is
> to do a really good job of cataloging the landscape of use cases and how
> prevalent each one is. By that I mean not just listing lots of uses, but
> also grouping them into categories that functionally need the same thing.
> In the absence of this it is very hard to reason about design proposals.
> From the proposals so far I think we have a lot of discussion around
> possible apis, but less around what the user needs for different use cases
> and how they would implement that using the api.
>
> Here is an example:
> You aggregate click and impression data for a reddit like site. Every ten
> minutes you want to output a ranked list of the top 10 articles ranked by
> clicks/impressions for each geographical area. I want to be able run this
> in steady state as well as rerun to regenerate results (or catch up if it
> crashes).
>
> There are a couple of tricky things that seem to make this hard with either
> of the options proposed:
> 1. If I emit this data using event time I have the problem described where
> a geographical region with no new clicks or impressions will fail to output
> results.
> 2. If I emit this data using system time I have the problem that when
> reprocessing data my window may not be ten minutes but 10 hours if my
> processing is very fast so it dramatically changes the output.
>
> Maybe a hybrid solution works: I window by event time but trigger results
> by system time for windows that have updated? Not really sure the details
> of making that work. Does that work? Are there concrete examples where you
> actually want the current behavior?
>
> -Jay
>
>
> On Tue, Apr 4, 2017 at 8:32 PM, Arun Mathew 
> wrote:
>
> > Hi All,
> >
> > Thanks for the KIP. We were also in need of a mechanism to trigger
> > punctuate in the absence of events.
> >
> > As I described in [
> > https://issues.apache.org/jira/browse/KAFKA-3514?
> > focusedCommentId=15926036=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-15926036
> > ],
> >
> >- Our approached involved using the event time by default.
> >- The method to check if there is any punctuate ready in the
> >PunctuationQueue is triggered via the any event received by the stream
> >tread, or at the polling intervals in the absence of any events.
> >- When we create Punctuate objects (which contains the next event time
> >for punctuation and interval), we also record the creation time
> (system
> >time).
> >- While checking for maturity of Punctuate Schedule by mayBePunctuate
> >method, we also check if the system clock has elapsed the punctuate
> >interval since the schedule creation time.
> >- In the absence of any event, or in the absence of any event for one
> >topic in the partition group assigned to the stream task, the system
> > time
> >will elapse the interval and we trigger a punctuate using the expected
> >punctuation event time.
> >- we then create the next punctuation schedule as punctuation event
> time
> >+ punctuation interval, [again recording the system time of creation
> of
> > the
> >schedule].
> >
> > We call this a Hybrid Punctuate. Of course, this approach has pros and
> > cons.
> > Pros
> >
> >- Punctuates will happen in  time duration at max
> in
> >terms of system time.
> >- The semantics as a whole continues to revolve around event time.
> >- We can use the old data [old timestamps] to rerun any experiments or
> >tests.
> >
> > Cons
> >
> >- In case the   is not a time duration [say
> logical
> >time/event count], then the approach might not be meaningful.
> >- In case there is a case where we have to wait for an actual event
> from
> >a low event rate partition in the partition group, this approach will
> > jump
> >the gun.
> >- in case the event processing cannot catch up with the event rate and
> >the expected timestamp events gets queued for long time, this approach
> >might jump the gun.
> >
> > I believe the above approach and discussion goes close to the approach A.
> >
> > ---
> >
> > I like the idea of having an even count based punctuate.
> >
> > ---
> >
> > I agree with the discussion around approach C, that we should provide the
> > user with the option to choose system time or event time based
> punctuates.
> > But I believe that the user predominantly wants to use event time while
> not
> > missing out on regular punctuates due to event delays or event absences.
> > Hence a complex punctuate option as Matthias mentioned (quoted below)
> would
> > be most apt.
> >
> > "- We might want to add "complex" schedules later on (like, punctuate on
> 

Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-02 Thread Tianji Li
Hi Guys,

Thanks so much for your quick replies, very appreciated!

Thanks
Tianji

On Wed, Mar 1, 2017 at 2:53 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> It should be:
>
> groupBy -> always trigger repartitioning
> groupByKey -> maybe trigger repartitioning
>
> And there will not be two repartitioning topics. The repartitioning will
> be done by the groupBy/groupByKey operation, and thus, in the
> aggregation step we know that data is correctly partitioned and there
> will be no second repartitioning topic.
>
>
>
> -Matthias
>
> On 3/1/17 11:25 AM, Michael Noll wrote:
> > FYI: The difference between `groupBy` (may trigger re-partitioning) vs.
> > `groupByKey` (does not trigger re-partitioning) also applies to:
> >
> > - `map` vs. `mapValues`
> > - `flatMap` vs. `flatMapValues`
> >
> >
> >
> > On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy <damian@gmail.com> wrote:
> >
> >> If you use stream.groupByKey() then there will be no repartitioning as
> long
> >> as there have been no key changing operations preceding it, i.e, map,
> >> selectKey, flatMap, transform. If you use stream.groupBy(...) then we
> see
> >> it as a key changing operation, hence we need to repartition the data.
> >>
> >> On Wed, 1 Mar 2017 at 18:59 Tianji Li <skyah...@gmail.com> wrote:
> >>
> >>> Hi there,
> >>>
> >>> I wonder if it makes sense to give the option to disable auto
> >>> repartitioning while doing groupBy.
> >>>
> >>> I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
> >>> an internal topic for repartition will be automatically created and
> >> synced
> >>> to brokers, which is useful when aggregation keys are not the ones used
> >>> when ingesting raw data.
> >>>
> >>> However, in my case, I have carefully partitioned the data when
> ingesting
> >>> my raw topics. If I do groupBy followed by aggregation, there will be
> TWO
> >>> change logs topics, one for groupBy another or aggregation.
> >>>
> >>> Does it make sense to make the groupBy one configurable?
> >>>
> >>> Thanks
> >>> Tianji
> >>>
> >>
> >
> >
> >
>
>


groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Tianji Li
Hi there,

I wonder if it makes sense to give the option to disable auto
repartitioning while doing groupBy.

I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
an internal topic for repartition will be automatically created and synced
to brokers, which is useful when aggregation keys are not the ones used
when ingesting raw data.

However, in my case, I have carefully partitioned the data when ingesting
my raw topics. If I do groupBy followed by aggregation, there will be TWO
change logs topics, one for groupBy another or aggregation.

Does it make sense to make the groupBy one configurable?

Thanks
Tianji


Re: Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-22 Thread Tianji Li
Hi Florian,

Just curious, what 'shared storage' you guys use to keep the files before
ingested into Kafka?

In our case, we could not figure out such a nice distributed+shared file
system that is NOT HDFS alike and runs before Kafka. So we use individual
harddisks on connector machines and keep offsets/etc local on the
harddisks. That is, before Kafka Connect, everything is disconnected, and
from Kafka Connect, everything is on a distributed system.

Thanks
Tianji


On Mon, Feb 20, 2017 at 6:24 AM, Florian Hussonnois 
wrote:

> Hi Jason,
>
> Yes, this is the idea. The connector assigns a subset of files to each
> task.
>
> A task stores the size of file, the bytes offset and the bytes size of the
> last sent record as a source offsets.
> A file is finished when recordBytesOffsets + recordBytesSize =
> fileBytesSize.
>
> The connector should be able to start a thread in background to track
> offsets for each assigned file.
> When all tasks has finished the connector can stop tasks or assigned new
> files by requesting tasks reconfiguration.
>
> Another advantage of monitoring source offsets from the connector is detect
> slow or failed tasks and if necessary to be able to restart all tasks.
>
> Thanks,
>
> 2017-02-18 6:47 GMT+01:00 Jason Gustafson :
>
> > Hey Florian,
> >
> > Can you explain a bit more how having access to the offset storage from
> the
> > connector helps in your use case? I guess you are planning to use offsets
> > to be able to tell when a task has finished a file?
> >
> > Thanks,
> > Jason
> >
> > On Fri, Feb 17, 2017 at 4:45 AM, Florian Hussonnois <
> fhussonn...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kafka Team,
> > >
> > > I'm developping a connector which need to monitor the progress of its
> > tasks
> > > in order to be able to request a tasks reconfiguration in some
> > situations.
> > >
> > > Our connector is pretty simple. It's used to stream a thousands of
> files
> > > into Kafka. The connector scans directories then schedules each task
> > with a
> > > set of assigned files.
> > > When tasks are no longer required or new files are detected the
> connector
> > > requests a reconfiguration.
> > >
> > > In addition, files are store into a shared storage which is accessible
> > from
> > > each connect worker. In that way, we can distribute file streaming.
> > >
> > > For that prupose, it would be very convenient to have access to an
> > > offsetStorageReader instance from either the Connector class or the
> > > ConnectorContext class.
> > >
> > > I found a similar question:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg50579.html
> > >
> > > Do you think this improvement could be considered ? I can contribute to
> > it.
> > >
> > > Thanks,
> > >
> > > --
> > > Florian HUSSONNOIS
> > >
> >
>
>
>
> --
> Florian HUSSONNOIS
>


[jira] [Commented] (KAFKA-4400) Prefix for sink task consumer groups should be configurable

2016-11-16 Thread Tianji Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671457#comment-15671457
 ] 

Tianji Li commented on KAFKA-4400:
--

[~ewencp] We got bugged by this very issue a few times and so I did a PR to fix 
it. Please let me know if it is OK.

> Prefix for sink task consumer groups should be configurable
> ---
>
> Key: KAFKA-4400
> URL: https://issues.apache.org/jira/browse/KAFKA-4400
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Currently the prefix for creating consumer groups is fixed. This means that 
> if you run multiple Connect clusters using the same Kafka cluster and create 
> connectors with the same name, sink tasks in different clusters will join the 
> same group. Making this prefix configurable at the worker level would protect 
> against this.
> An alternative would be to define unique cluster IDs for each connect 
> cluster, which would allow us to construct a unique name for the group 
> without requiring yet another config (but presents something of a 
> compatibility challenge).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4371) Sporadic ConnectException shuts down the whole connect process

2016-11-03 Thread Tianji Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632506#comment-15632506
 ] 

Tianji Li edited comment on KAFKA-4371 at 11/3/16 11:50 AM:


This seems like a bug in the https://github.com/confluentinc/kafka-connect-jdbc 
repository, rather than Kafka itself.

The SocketException (or rather IOException) at line 
https://github.com/confluentinc/kafka-connect-jdbc/blob/master/src/main/java/io/confluent/connect/jdbc/source/JdbcSourceTask.java#L211
 is not caught.


was (Author: skyahead):
This seems like a bug in the https://github.com/confluentinc/kafka-connect-jdbc 
repository, rather than Kafka itself.

The SQLException (or rather IOException) at line 
https://github.com/confluentinc/kafka-connect-jdbc/blob/master/src/main/java/io/confluent/connect/jdbc/source/JdbcSourceTask.java#L211
 is not caught.

> Sporadic ConnectException shuts down the whole connect process
> --
>
> Key: KAFKA-4371
> URL: https://issues.apache.org/jira/browse/KAFKA-4371
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sagar Rao
>Priority: Critical
>
> I had setup a 2 node distributed kafka-connect process. Everything went well 
> and I could see lot of data flowing into the relevant kafka topics.
> After some time, JDBCUtils.getCurrentTimeOnDB threw a ConnectException with 
> the following stacktrace:
> The last packet successfully received from the server was 792 milliseconds 
> ago.  The last packet sent successfully to the server was 286 milliseconds 
> ago. (io.confluent.connect.jdbc.source.JdbcSourceTask:234)
> [2016-11-02 12:42:06,116] ERROR Failed to get current time from DB using 
> query select CURRENT_TIMESTAMP; on database MySQL 
> (io.confluent.connect.jdbc.util.JdbcUtils:226)
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
> failure
> The last packet successfully received from the server was 1,855 milliseconds 
> ago.  The last packet sent successfully to the server was 557 milliseconds 
> ago.
>at sun.reflect.GeneratedConstructorAccessor51.newInstance(Unknown 
> Source)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>at 
> com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
>at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3829)
>at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2449)
>at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
>at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
>at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
>at 
> com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
>at 
> com.mysql.jdbc.StatementImpl.createResultSetUsingServerFetch(StatementImpl.java:651)
>at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1527)
>at 
> io.confluent.connect.jdbc.util.JdbcUtils.getCurrentTimeOnDB(JdbcUtils.java:220)
>at 
> io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:157)
>at 
> io.confluent.connect.jdbc.source.TableQuerier.maybeStartQuery(TableQuerier.java:78)
>at 
> io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:57)
>at 
> io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:207)
>at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:155)
>at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
>at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
>at java.net.SocketOutputStream.socketWrite0(Native Method)
>at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>at java.net.Soc

[jira] [Commented] (KAFKA-4371) Sporadic ConnectException shuts down the whole connect process

2016-11-03 Thread Tianji Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632506#comment-15632506
 ] 

Tianji Li commented on KAFKA-4371:
--

This seems like a bug in the https://github.com/confluentinc/kafka-connect-jdbc 
repository, rather than Kafka itself.

The SQLException (or rather IOException) at line 
https://github.com/confluentinc/kafka-connect-jdbc/blob/master/src/main/java/io/confluent/connect/jdbc/source/JdbcSourceTask.java#L211
 is not caught.

> Sporadic ConnectException shuts down the whole connect process
> --
>
> Key: KAFKA-4371
> URL: https://issues.apache.org/jira/browse/KAFKA-4371
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sagar Rao
>Priority: Critical
>
> I had setup a 2 node distributed kafka-connect process. Everything went well 
> and I could see lot of data flowing into the relevant kafka topics.
> After some time, JDBCUtils.getCurrentTimeOnDB threw a ConnectException with 
> the following stacktrace:
> The last packet successfully received from the server was 792 milliseconds 
> ago.  The last packet sent successfully to the server was 286 milliseconds 
> ago. (io.confluent.connect.jdbc.source.JdbcSourceTask:234)
> [2016-11-02 12:42:06,116] ERROR Failed to get current time from DB using 
> query select CURRENT_TIMESTAMP; on database MySQL 
> (io.confluent.connect.jdbc.util.JdbcUtils:226)
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
> failure
> The last packet successfully received from the server was 1,855 milliseconds 
> ago.  The last packet sent successfully to the server was 557 milliseconds 
> ago.
>at sun.reflect.GeneratedConstructorAccessor51.newInstance(Unknown 
> Source)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>at 
> com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
>at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3829)
>at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2449)
>at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
>at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
>at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
>at 
> com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
>at 
> com.mysql.jdbc.StatementImpl.createResultSetUsingServerFetch(StatementImpl.java:651)
>at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1527)
>at 
> io.confluent.connect.jdbc.util.JdbcUtils.getCurrentTimeOnDB(JdbcUtils.java:220)
>at 
> io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:157)
>at 
> io.confluent.connect.jdbc.source.TableQuerier.maybeStartQuery(TableQuerier.java:78)
>at 
> io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:57)
>at 
> io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:207)
>at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:155)
>at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
>at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
>at java.net.SocketOutputStream.socketWrite0(Native Method)
>at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3810)
>... 20 more
> This was just a minor glitch to the connection as the ec2 isntances are able 
> to connect to the Mysql Aurora instances