Re: kafka connect(copycat) question

2015-12-10 Thread Ewen Cheslack-Postava
Roman,

Agreed, this is definitely a gap in the docs (both Kafka's and Confluent's)
right now. The reason it was lower priority for documentation than other
items is that we expect there will be relatively few converter
implementations, especially compared to the number of converters.
Converters correspond to serialization formats (and any supporting pieces,
like Confluent's Schema Registry for the AvroConverter), so there might be
a few for, e.g., Avro, JSON, Protocol Buffers, Thrift, and possibly
variants (e.g. if you have a different approach for managing Avro schemas
than Confluent's schema registry).

https://cwiki.apache.org/confluence/display/KAFKA/Copycat+Data+API has a
slightly outdated image that explains how Converters fit into the data
processing pipeline in Kafka Connect. The API is also quite simple:
http://docs.confluent.io/2.0.0/connect/javadocs/org/apache/kafka/connect/storage/Converter.html

-Ewen

On Thu, Dec 10, 2015 at 3:34 AM, Roman Shtykh 
wrote:

> Ewen,
>
> I just thought it would be helpful to have more detailed information on
> converters (including what you described here) on
> http://docs.confluent.io/2.0.0/connect/devguide.html
>
> Thanks,
> Roman
>
>
>
> On Wednesday, November 11, 2015 6:59 AM, Ewen Cheslack-Postava <
> e...@confluent.io> wrote:
> Hi Venkatesh,
>
> If you're using the default settings included in the sample configs, it'll
> expect JSON data in a special format to support passing schemas along with
> the data. This is turned on by default because it makes it possible to work
> with a *lot* more connectors and data storage systems (many require
> schemas!), though it does mean consuming regular JSON data won't work out
> of the box. You can easily switch this off by changing these lines in the
> worker config:
>
> key.converter.schemas.enable=true
> value.converter.schemas.enable=true
>
> to be false instead. However, note that this will only work with connectors
> that can work with "schemaless" data. This wouldn't work for, e.g., writing
> Avro files in HDFS since they need schema information, but it might work
> for other formats. This would allow you to consume JSON data from any topic
> it already existed in.
>
> Note that JSON is not the only format you can use. You can also substitute
> other implementations of the Converter interface. Confluent has implemented
> an Avro version that works well with our schema registry (
> https://github.com/confluentinc/schema-registry/tree/master/avro-converter
> ).
> The JSON implementation made sense to add as the one included with Kafka
> simply because it didn't introduce any other dependencies that weren't
> already in Kafka. It's also possible to write implementations for other
> formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and
> more), but I'm not aware of anyone who has started to tackle those
> converters yet.
>
> -Ewen
>
> On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju <
> venkatengineer...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying out the new kakfa connect service.
> >
> > version : kafka_2.11-0.9.0.0
> > mode: standalone
> >
> > I have a conceptual question on the service.
> >
> > Can I just start a sink connector which reads from Kafka and writes to
> say
> > HDFS ?
> > From what I have tried, it's expecting a source-connector as well because
> > the sink-connector is expecting a particular pattern of the message in
> > kafka-topic.
> >
> > Thanks,
> > Venkat
>
> >
>
>
>
> --
> Thanks,
> Ewen
>



-- 
Thanks,
Ewen


Re: kafka connect(copycat) question

2015-12-08 Thread Ewen Cheslack-Postava
Svante,

Just to clarify, the HDFS connector relies on some Avro translation code
which is in a separate repository. You need the
https://github.com/confluentinc/schema-registry repository built before the
kafka-connector-hdfs repository to get that dependency.

Confluent has now also released Confluent Platform 2.0.0, which includes
the connector -- you can download it here:
http://www.confluent.io/developer#download

-Ewen

On Thu, Dec 3, 2015 at 2:42 AM, Svante Karlsson 
wrote:

> Hi, I tried building this today and the problem seems to remain.
>
> /svante
>
>
>
> [INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT
> [INFO]
> 
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/maven-metadata.xml
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.pom
> [WARNING] The POM for
> io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT is missing, no
> dependency information available
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/maven-metadata.xml
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.pom
> [WARNING] The POM for io.confluent:common-config:jar:2.0.0-SNAPSHOT is
> missing, no dependency information available
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.jar
> Downloading:
>
> http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.jar
> [INFO]
> 
> [INFO] BUILD FAILURE
>
>
> > --
> > Thanks,
> > Ewen
> >
>



-- 
Thanks,
Ewen


Re: kafka connect(copycat) question

2015-12-03 Thread Svante Karlsson
Hi, I tried building this today and the problem seems to remain.

/svante



[INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT
[INFO]

Downloading:
http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/maven-metadata.xml
Downloading:
http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.pom
[WARNING] The POM for
io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT is missing, no
dependency information available
Downloading:
http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/maven-metadata.xml
Downloading:
http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.pom
[WARNING] The POM for io.confluent:common-config:jar:2.0.0-SNAPSHOT is
missing, no dependency information available
Downloading:
http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.jar
Downloading:
http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.jar
[INFO]

[INFO] BUILD FAILURE


> --
> Thanks,
> Ewen
>


Re: kafka connect(copycat) question

2015-11-16 Thread Ewen Cheslack-Postava
Sorry, there was an out of date reference in the pom.xml, the version on
master should build fine now.

-Ewen

On Sat, Nov 14, 2015 at 1:54 PM, Venkatesh Rudraraju <
venkatengineer...@gmail.com> wrote:

> I tried building copycat-hdfs but its not able to pull dependencies from
> maven...
>
> error trace :
> ---
>  Failed to execute goal on project kafka-connect-hdfs: Could not resolve
> dependencies for project
> io.confluent:kafka-connect-hdfs:jar:2.0.0-SNAPSHOT: The following artifacts
> could not be resolved: org.apache.kafka:connect-api:jar:0.9.0.0,
> io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT,
> io.confluent:common-config:jar:2.0.0-SNAPSHOT: Could not find artifact
> org.apache.kafka:connect-api:jar:0.9.0.0 in confluent
>
> On Thu, Nov 12, 2015 at 2:59 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Yes, though it's still awaiting some updates after some renaming and API
> > modifications that happened in Kafka recently.
> >
> > -Ewen
> >
> > On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju <
> > venkatengineer...@gmail.com> wrote:
> >
> > > Ewen,
> > >
> > > How do I use a HDFSSinkConnector. I see the sink as part of a confluent
> > > project (
> > >
> > >
> >
> https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java
> > > ).
> > > Does it mean that I build this project and add the jar to kafka libs ?
> > >
> > >
> > >
> > >
> > > On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Venkatesh,
> > > >
> > > > 1. It only works with quotes because the message needs to be parsed
> as
> > > JSON
> > > > -- a bare string without quotes is not valid JSON. If you're just
> > using a
> > > > file sink, you can also try the StringConverter, which only supports
> > > > strings and uses a fixed schema, but is also very easy to use since
> it
> > > has
> > > > minimal requirements. It's really meant for demonstration purposes
> more
> > > > than anything else, but may be helpful just to get up and running.
> > > > 2. Which JsonParser error? When processing a message fails, we need
> to
> > be
> > > > careful about how we handle it. Currently it will not proceed if it
> > can't
> > > > process a message since for a lot of applications it isn't acceptable
> > to
> > > > drop messages. By default, we want at least once semantics, with
> > exactly
> > > > once as long as we don't encounter any crashes or network errors.
> > Manual
> > > > intervention is currently required in that case.
> > > >
> > > > -Ewen
> > > >
> > > > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju <
> > > > venkatengineer...@gmail.com> wrote:
> > > >
> > > > > Hi Ewen,
> > > > >
> > > > > Thanks for the explanation. with your suggested setting, I was able
> > to
> > > > > start just a sink connector like below :
> > > > >
> > > > > >* bin/connect-standalone.sh config/connect-standalone.properties
> > > > > config/connect-file-sink.properties*
> > > > >
> > > > > But I have a couple of issues yet,
> > > > > 1) Since I am only testing a simple file sink connector, I am
> > manually
> > > > > producing some messages to the 'connect-test' kafka topic, where
> the
> > > > > sink-Task is reading from. And it works only if the message is
> within
> > > > > double-quotes.
> > > > > 2) Once I hit the above JsonParser error on the SinkTask, the
> > connector
> > > > is
> > > > > hung, doesn't take any more messages even proper ones.
> > > > >
> > > > >
> > > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava <
> > > > e...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Venkatesh,
> > > > > >
> > > > > > If you're using the default settings included in the sample
> > configs,
> > > > > it'll
> > > > > > expect JSON data in a special format to support passing schemas
> > along
> > > > > with
> > > > > > the data. This is turned on by default because it makes it
> possible
> > > to
> > > > > work
> > > > > > with a *lot* more connectors and data storage systems (many
> require
> > > > > > schemas!), though it does mean consuming regular JSON data won't
> > work
> > > > out
> > > > > > of the box. You can easily switch this off by changing these
> lines
> > in
> > > > the
> > > > > > worker config:
> > > > > >
> > > > > > key.converter.schemas.enable=true
> > > > > > value.converter.schemas.enable=true
> > > > > >
> > > > > > to be false instead. However, note that this will only work with
> > > > > connectors
> > > > > > that can work with "schemaless" data. This wouldn't work for,
> e.g.,
> > > > > writing
> > > > > > Avro files in HDFS since they need schema information, but it
> might
> > > > work
> > > > > > for other formats. This would allow you to consume JSON data from
> > any
> > > > > topic
> > > > > > it already existed in.
> > > > > >
> > > > > > Note that JSON is not the only format you can use. You can also
> > > > > substitute
> > > > > > other 

Re: kafka connect(copycat) question

2015-11-14 Thread Venkatesh Rudraraju
I tried building copycat-hdfs but its not able to pull dependencies from
maven...

error trace :
---
 Failed to execute goal on project kafka-connect-hdfs: Could not resolve
dependencies for project
io.confluent:kafka-connect-hdfs:jar:2.0.0-SNAPSHOT: The following artifacts
could not be resolved: org.apache.kafka:connect-api:jar:0.9.0.0,
io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT,
io.confluent:common-config:jar:2.0.0-SNAPSHOT: Could not find artifact
org.apache.kafka:connect-api:jar:0.9.0.0 in confluent

On Thu, Nov 12, 2015 at 2:59 PM, Ewen Cheslack-Postava 
wrote:

> Yes, though it's still awaiting some updates after some renaming and API
> modifications that happened in Kafka recently.
>
> -Ewen
>
> On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju <
> venkatengineer...@gmail.com> wrote:
>
> > Ewen,
> >
> > How do I use a HDFSSinkConnector. I see the sink as part of a confluent
> > project (
> >
> >
> https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java
> > ).
> > Does it mean that I build this project and add the jar to kafka libs ?
> >
> >
> >
> >
> > On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > Venkatesh,
> > >
> > > 1. It only works with quotes because the message needs to be parsed as
> > JSON
> > > -- a bare string without quotes is not valid JSON. If you're just
> using a
> > > file sink, you can also try the StringConverter, which only supports
> > > strings and uses a fixed schema, but is also very easy to use since it
> > has
> > > minimal requirements. It's really meant for demonstration purposes more
> > > than anything else, but may be helpful just to get up and running.
> > > 2. Which JsonParser error? When processing a message fails, we need to
> be
> > > careful about how we handle it. Currently it will not proceed if it
> can't
> > > process a message since for a lot of applications it isn't acceptable
> to
> > > drop messages. By default, we want at least once semantics, with
> exactly
> > > once as long as we don't encounter any crashes or network errors.
> Manual
> > > intervention is currently required in that case.
> > >
> > > -Ewen
> > >
> > > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju <
> > > venkatengineer...@gmail.com> wrote:
> > >
> > > > Hi Ewen,
> > > >
> > > > Thanks for the explanation. with your suggested setting, I was able
> to
> > > > start just a sink connector like below :
> > > >
> > > > >* bin/connect-standalone.sh config/connect-standalone.properties
> > > > config/connect-file-sink.properties*
> > > >
> > > > But I have a couple of issues yet,
> > > > 1) Since I am only testing a simple file sink connector, I am
> manually
> > > > producing some messages to the 'connect-test' kafka topic, where the
> > > > sink-Task is reading from. And it works only if the message is within
> > > > double-quotes.
> > > > 2) Once I hit the above JsonParser error on the SinkTask, the
> connector
> > > is
> > > > hung, doesn't take any more messages even proper ones.
> > > >
> > > >
> > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava <
> > > e...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Venkatesh,
> > > > >
> > > > > If you're using the default settings included in the sample
> configs,
> > > > it'll
> > > > > expect JSON data in a special format to support passing schemas
> along
> > > > with
> > > > > the data. This is turned on by default because it makes it possible
> > to
> > > > work
> > > > > with a *lot* more connectors and data storage systems (many require
> > > > > schemas!), though it does mean consuming regular JSON data won't
> work
> > > out
> > > > > of the box. You can easily switch this off by changing these lines
> in
> > > the
> > > > > worker config:
> > > > >
> > > > > key.converter.schemas.enable=true
> > > > > value.converter.schemas.enable=true
> > > > >
> > > > > to be false instead. However, note that this will only work with
> > > > connectors
> > > > > that can work with "schemaless" data. This wouldn't work for, e.g.,
> > > > writing
> > > > > Avro files in HDFS since they need schema information, but it might
> > > work
> > > > > for other formats. This would allow you to consume JSON data from
> any
> > > > topic
> > > > > it already existed in.
> > > > >
> > > > > Note that JSON is not the only format you can use. You can also
> > > > substitute
> > > > > other implementations of the Converter interface. Confluent has
> > > > implemented
> > > > > an Avro version that works well with our schema registry (
> > > > >
> > > >
> > >
> >
> https://github.com/confluentinc/schema-registry/tree/master/avro-converter
> > > > > ).
> > > > > The JSON implementation made sense to add as the one included with
> > > Kafka
> > > > > simply because it didn't introduce any other dependencies that
> > weren't
> > > > > already in Kafka. It's also possible to write 

Re: kafka connect(copycat) question

2015-11-12 Thread Ewen Cheslack-Postava
Yes, though it's still awaiting some updates after some renaming and API
modifications that happened in Kafka recently.

-Ewen

On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju <
venkatengineer...@gmail.com> wrote:

> Ewen,
>
> How do I use a HDFSSinkConnector. I see the sink as part of a confluent
> project (
>
> https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java
> ).
> Does it mean that I build this project and add the jar to kafka libs ?
>
>
>
>
> On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Venkatesh,
> >
> > 1. It only works with quotes because the message needs to be parsed as
> JSON
> > -- a bare string without quotes is not valid JSON. If you're just using a
> > file sink, you can also try the StringConverter, which only supports
> > strings and uses a fixed schema, but is also very easy to use since it
> has
> > minimal requirements. It's really meant for demonstration purposes more
> > than anything else, but may be helpful just to get up and running.
> > 2. Which JsonParser error? When processing a message fails, we need to be
> > careful about how we handle it. Currently it will not proceed if it can't
> > process a message since for a lot of applications it isn't acceptable to
> > drop messages. By default, we want at least once semantics, with exactly
> > once as long as we don't encounter any crashes or network errors. Manual
> > intervention is currently required in that case.
> >
> > -Ewen
> >
> > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju <
> > venkatengineer...@gmail.com> wrote:
> >
> > > Hi Ewen,
> > >
> > > Thanks for the explanation. with your suggested setting, I was able to
> > > start just a sink connector like below :
> > >
> > > >* bin/connect-standalone.sh config/connect-standalone.properties
> > > config/connect-file-sink.properties*
> > >
> > > But I have a couple of issues yet,
> > > 1) Since I am only testing a simple file sink connector, I am manually
> > > producing some messages to the 'connect-test' kafka topic, where the
> > > sink-Task is reading from. And it works only if the message is within
> > > double-quotes.
> > > 2) Once I hit the above JsonParser error on the SinkTask, the connector
> > is
> > > hung, doesn't take any more messages even proper ones.
> > >
> > >
> > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Hi Venkatesh,
> > > >
> > > > If you're using the default settings included in the sample configs,
> > > it'll
> > > > expect JSON data in a special format to support passing schemas along
> > > with
> > > > the data. This is turned on by default because it makes it possible
> to
> > > work
> > > > with a *lot* more connectors and data storage systems (many require
> > > > schemas!), though it does mean consuming regular JSON data won't work
> > out
> > > > of the box. You can easily switch this off by changing these lines in
> > the
> > > > worker config:
> > > >
> > > > key.converter.schemas.enable=true
> > > > value.converter.schemas.enable=true
> > > >
> > > > to be false instead. However, note that this will only work with
> > > connectors
> > > > that can work with "schemaless" data. This wouldn't work for, e.g.,
> > > writing
> > > > Avro files in HDFS since they need schema information, but it might
> > work
> > > > for other formats. This would allow you to consume JSON data from any
> > > topic
> > > > it already existed in.
> > > >
> > > > Note that JSON is not the only format you can use. You can also
> > > substitute
> > > > other implementations of the Converter interface. Confluent has
> > > implemented
> > > > an Avro version that works well with our schema registry (
> > > >
> > >
> >
> https://github.com/confluentinc/schema-registry/tree/master/avro-converter
> > > > ).
> > > > The JSON implementation made sense to add as the one included with
> > Kafka
> > > > simply because it didn't introduce any other dependencies that
> weren't
> > > > already in Kafka. It's also possible to write implementations for
> other
> > > > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and
> > > > more), but I'm not aware of anyone who has started to tackle those
> > > > converters yet.
> > > >
> > > > -Ewen
> > > >
> > > > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju <
> > > > venkatengineer...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying out the new kakfa connect service.
> > > > >
> > > > > version : kafka_2.11-0.9.0.0
> > > > > mode: standalone
> > > > >
> > > > > I have a conceptual question on the service.
> > > > >
> > > > > Can I just start a sink connector which reads from Kafka and writes
> > to
> > > > say
> > > > > HDFS ?
> > > > > From what I have tried, it's expecting a source-connector as well
> > > because
> > > > > the sink-connector is expecting a particular pattern of the message
> > in
> > 

Re: kafka connect(copycat) question

2015-11-10 Thread Ewen Cheslack-Postava
Hi Venkatesh,

If you're using the default settings included in the sample configs, it'll
expect JSON data in a special format to support passing schemas along with
the data. This is turned on by default because it makes it possible to work
with a *lot* more connectors and data storage systems (many require
schemas!), though it does mean consuming regular JSON data won't work out
of the box. You can easily switch this off by changing these lines in the
worker config:

key.converter.schemas.enable=true
value.converter.schemas.enable=true

to be false instead. However, note that this will only work with connectors
that can work with "schemaless" data. This wouldn't work for, e.g., writing
Avro files in HDFS since they need schema information, but it might work
for other formats. This would allow you to consume JSON data from any topic
it already existed in.

Note that JSON is not the only format you can use. You can also substitute
other implementations of the Converter interface. Confluent has implemented
an Avro version that works well with our schema registry (
https://github.com/confluentinc/schema-registry/tree/master/avro-converter).
The JSON implementation made sense to add as the one included with Kafka
simply because it didn't introduce any other dependencies that weren't
already in Kafka. It's also possible to write implementations for other
formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and
more), but I'm not aware of anyone who has started to tackle those
converters yet.

-Ewen

On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju <
venkatengineer...@gmail.com> wrote:

> Hi,
>
> I am trying out the new kakfa connect service.
>
> version : kafka_2.11-0.9.0.0
> mode: standalone
>
> I have a conceptual question on the service.
>
> Can I just start a sink connector which reads from Kafka and writes to say
> HDFS ?
> From what I have tried, it's expecting a source-connector as well because
> the sink-connector is expecting a particular pattern of the message in
> kafka-topic.
>
> Thanks,
> Venkat
>



-- 
Thanks,
Ewen


Re: kafka connect(copycat) question

2015-11-10 Thread Venkatesh Rudraraju
Hi Ewen,

Thanks for the explanation. with your suggested setting, I was able to
start just a sink connector like below :

>* bin/connect-standalone.sh config/connect-standalone.properties
config/connect-file-sink.properties*

But I have a couple of issues yet,
1) Since I am only testing a simple file sink connector, I am manually
producing some messages to the 'connect-test' kafka topic, where the
sink-Task is reading from. And it works only if the message is within
double-quotes.
2) Once I hit the above JsonParser error on the SinkTask, the connector is
hung, doesn't take any more messages even proper ones.


On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava 
wrote:

> Hi Venkatesh,
>
> If you're using the default settings included in the sample configs, it'll
> expect JSON data in a special format to support passing schemas along with
> the data. This is turned on by default because it makes it possible to work
> with a *lot* more connectors and data storage systems (many require
> schemas!), though it does mean consuming regular JSON data won't work out
> of the box. You can easily switch this off by changing these lines in the
> worker config:
>
> key.converter.schemas.enable=true
> value.converter.schemas.enable=true
>
> to be false instead. However, note that this will only work with connectors
> that can work with "schemaless" data. This wouldn't work for, e.g., writing
> Avro files in HDFS since they need schema information, but it might work
> for other formats. This would allow you to consume JSON data from any topic
> it already existed in.
>
> Note that JSON is not the only format you can use. You can also substitute
> other implementations of the Converter interface. Confluent has implemented
> an Avro version that works well with our schema registry (
> https://github.com/confluentinc/schema-registry/tree/master/avro-converter
> ).
> The JSON implementation made sense to add as the one included with Kafka
> simply because it didn't introduce any other dependencies that weren't
> already in Kafka. It's also possible to write implementations for other
> formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and
> more), but I'm not aware of anyone who has started to tackle those
> converters yet.
>
> -Ewen
>
> On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju <
> venkatengineer...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying out the new kakfa connect service.
> >
> > version : kafka_2.11-0.9.0.0
> > mode: standalone
> >
> > I have a conceptual question on the service.
> >
> > Can I just start a sink connector which reads from Kafka and writes to
> say
> > HDFS ?
> > From what I have tried, it's expecting a source-connector as well because
> > the sink-connector is expecting a particular pattern of the message in
> > kafka-topic.
> >
> > Thanks,
> > Venkat
> >
>
>
>
> --
> Thanks,
> Ewen
>



-- 
Victory awaits him who has everything in order--luck, people call it.


Re: kafka connect(copycat) question

2015-11-10 Thread Ewen Cheslack-Postava
Venkatesh,

1. It only works with quotes because the message needs to be parsed as JSON
-- a bare string without quotes is not valid JSON. If you're just using a
file sink, you can also try the StringConverter, which only supports
strings and uses a fixed schema, but is also very easy to use since it has
minimal requirements. It's really meant for demonstration purposes more
than anything else, but may be helpful just to get up and running.
2. Which JsonParser error? When processing a message fails, we need to be
careful about how we handle it. Currently it will not proceed if it can't
process a message since for a lot of applications it isn't acceptable to
drop messages. By default, we want at least once semantics, with exactly
once as long as we don't encounter any crashes or network errors. Manual
intervention is currently required in that case.

-Ewen

On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju <
venkatengineer...@gmail.com> wrote:

> Hi Ewen,
>
> Thanks for the explanation. with your suggested setting, I was able to
> start just a sink connector like below :
>
> >* bin/connect-standalone.sh config/connect-standalone.properties
> config/connect-file-sink.properties*
>
> But I have a couple of issues yet,
> 1) Since I am only testing a simple file sink connector, I am manually
> producing some messages to the 'connect-test' kafka topic, where the
> sink-Task is reading from. And it works only if the message is within
> double-quotes.
> 2) Once I hit the above JsonParser error on the SinkTask, the connector is
> hung, doesn't take any more messages even proper ones.
>
>
> On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Hi Venkatesh,
> >
> > If you're using the default settings included in the sample configs,
> it'll
> > expect JSON data in a special format to support passing schemas along
> with
> > the data. This is turned on by default because it makes it possible to
> work
> > with a *lot* more connectors and data storage systems (many require
> > schemas!), though it does mean consuming regular JSON data won't work out
> > of the box. You can easily switch this off by changing these lines in the
> > worker config:
> >
> > key.converter.schemas.enable=true
> > value.converter.schemas.enable=true
> >
> > to be false instead. However, note that this will only work with
> connectors
> > that can work with "schemaless" data. This wouldn't work for, e.g.,
> writing
> > Avro files in HDFS since they need schema information, but it might work
> > for other formats. This would allow you to consume JSON data from any
> topic
> > it already existed in.
> >
> > Note that JSON is not the only format you can use. You can also
> substitute
> > other implementations of the Converter interface. Confluent has
> implemented
> > an Avro version that works well with our schema registry (
> >
> https://github.com/confluentinc/schema-registry/tree/master/avro-converter
> > ).
> > The JSON implementation made sense to add as the one included with Kafka
> > simply because it didn't introduce any other dependencies that weren't
> > already in Kafka. It's also possible to write implementations for other
> > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and
> > more), but I'm not aware of anyone who has started to tackle those
> > converters yet.
> >
> > -Ewen
> >
> > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju <
> > venkatengineer...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying out the new kakfa connect service.
> > >
> > > version : kafka_2.11-0.9.0.0
> > > mode: standalone
> > >
> > > I have a conceptual question on the service.
> > >
> > > Can I just start a sink connector which reads from Kafka and writes to
> > say
> > > HDFS ?
> > > From what I have tried, it's expecting a source-connector as well
> because
> > > the sink-connector is expecting a particular pattern of the message in
> > > kafka-topic.
> > >
> > > Thanks,
> > > Venkat
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>
>
>
> --
> Victory awaits him who has everything in order--luck, people call it.
>



-- 
Thanks,
Ewen