Re: kafka connect(copycat) question
Roman, Agreed, this is definitely a gap in the docs (both Kafka's and Confluent's) right now. The reason it was lower priority for documentation than other items is that we expect there will be relatively few converter implementations, especially compared to the number of converters. Converters correspond to serialization formats (and any supporting pieces, like Confluent's Schema Registry for the AvroConverter), so there might be a few for, e.g., Avro, JSON, Protocol Buffers, Thrift, and possibly variants (e.g. if you have a different approach for managing Avro schemas than Confluent's schema registry). https://cwiki.apache.org/confluence/display/KAFKA/Copycat+Data+API has a slightly outdated image that explains how Converters fit into the data processing pipeline in Kafka Connect. The API is also quite simple: http://docs.confluent.io/2.0.0/connect/javadocs/org/apache/kafka/connect/storage/Converter.html -Ewen On Thu, Dec 10, 2015 at 3:34 AM, Roman Shtykhwrote: > Ewen, > > I just thought it would be helpful to have more detailed information on > converters (including what you described here) on > http://docs.confluent.io/2.0.0/connect/devguide.html > > Thanks, > Roman > > > > On Wednesday, November 11, 2015 6:59 AM, Ewen Cheslack-Postava < > e...@confluent.io> wrote: > Hi Venkatesh, > > If you're using the default settings included in the sample configs, it'll > expect JSON data in a special format to support passing schemas along with > the data. This is turned on by default because it makes it possible to work > with a *lot* more connectors and data storage systems (many require > schemas!), though it does mean consuming regular JSON data won't work out > of the box. You can easily switch this off by changing these lines in the > worker config: > > key.converter.schemas.enable=true > value.converter.schemas.enable=true > > to be false instead. However, note that this will only work with connectors > that can work with "schemaless" data. This wouldn't work for, e.g., writing > Avro files in HDFS since they need schema information, but it might work > for other formats. This would allow you to consume JSON data from any topic > it already existed in. > > Note that JSON is not the only format you can use. You can also substitute > other implementations of the Converter interface. Confluent has implemented > an Avro version that works well with our schema registry ( > https://github.com/confluentinc/schema-registry/tree/master/avro-converter > ). > The JSON implementation made sense to add as the one included with Kafka > simply because it didn't introduce any other dependencies that weren't > already in Kafka. It's also possible to write implementations for other > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and > more), but I'm not aware of anyone who has started to tackle those > converters yet. > > -Ewen > > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju < > venkatengineer...@gmail.com> wrote: > > > Hi, > > > > I am trying out the new kakfa connect service. > > > > version : kafka_2.11-0.9.0.0 > > mode: standalone > > > > I have a conceptual question on the service. > > > > Can I just start a sink connector which reads from Kafka and writes to > say > > HDFS ? > > From what I have tried, it's expecting a source-connector as well because > > the sink-connector is expecting a particular pattern of the message in > > kafka-topic. > > > > Thanks, > > Venkat > > > > > > > -- > Thanks, > Ewen > -- Thanks, Ewen
Re: kafka connect(copycat) question
Svante, Just to clarify, the HDFS connector relies on some Avro translation code which is in a separate repository. You need the https://github.com/confluentinc/schema-registry repository built before the kafka-connector-hdfs repository to get that dependency. Confluent has now also released Confluent Platform 2.0.0, which includes the connector -- you can download it here: http://www.confluent.io/developer#download -Ewen On Thu, Dec 3, 2015 at 2:42 AM, Svante Karlssonwrote: > Hi, I tried building this today and the problem seems to remain. > > /svante > > > > [INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT > [INFO] > > Downloading: > > http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/maven-metadata.xml > Downloading: > > http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.pom > [WARNING] The POM for > io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT is missing, no > dependency information available > Downloading: > > http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/maven-metadata.xml > Downloading: > > http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.pom > [WARNING] The POM for io.confluent:common-config:jar:2.0.0-SNAPSHOT is > missing, no dependency information available > Downloading: > > http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.jar > Downloading: > > http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.jar > [INFO] > > [INFO] BUILD FAILURE > > > > -- > > Thanks, > > Ewen > > > -- Thanks, Ewen
Re: kafka connect(copycat) question
Hi, I tried building this today and the problem seems to remain. /svante [INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT [INFO] Downloading: http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/maven-metadata.xml Downloading: http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.pom [WARNING] The POM for io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT is missing, no dependency information available Downloading: http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/maven-metadata.xml Downloading: http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.pom [WARNING] The POM for io.confluent:common-config:jar:2.0.0-SNAPSHOT is missing, no dependency information available Downloading: http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.0.0-SNAPSHOT/kafka-connect-avro-converter-2.0.0-SNAPSHOT.jar Downloading: http://packages.confluent.io/maven/io/confluent/common-config/2.0.0-SNAPSHOT/common-config-2.0.0-SNAPSHOT.jar [INFO] [INFO] BUILD FAILURE > -- > Thanks, > Ewen >
Re: kafka connect(copycat) question
Sorry, there was an out of date reference in the pom.xml, the version on master should build fine now. -Ewen On Sat, Nov 14, 2015 at 1:54 PM, Venkatesh Rudraraju < venkatengineer...@gmail.com> wrote: > I tried building copycat-hdfs but its not able to pull dependencies from > maven... > > error trace : > --- > Failed to execute goal on project kafka-connect-hdfs: Could not resolve > dependencies for project > io.confluent:kafka-connect-hdfs:jar:2.0.0-SNAPSHOT: The following artifacts > could not be resolved: org.apache.kafka:connect-api:jar:0.9.0.0, > io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT, > io.confluent:common-config:jar:2.0.0-SNAPSHOT: Could not find artifact > org.apache.kafka:connect-api:jar:0.9.0.0 in confluent > > On Thu, Nov 12, 2015 at 2:59 PM, Ewen Cheslack-Postava> wrote: > > > Yes, though it's still awaiting some updates after some renaming and API > > modifications that happened in Kafka recently. > > > > -Ewen > > > > On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju < > > venkatengineer...@gmail.com> wrote: > > > > > Ewen, > > > > > > How do I use a HDFSSinkConnector. I see the sink as part of a confluent > > > project ( > > > > > > > > > https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java > > > ). > > > Does it mean that I build this project and add the jar to kafka libs ? > > > > > > > > > > > > > > > On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava < > > e...@confluent.io> > > > wrote: > > > > > > > Venkatesh, > > > > > > > > 1. It only works with quotes because the message needs to be parsed > as > > > JSON > > > > -- a bare string without quotes is not valid JSON. If you're just > > using a > > > > file sink, you can also try the StringConverter, which only supports > > > > strings and uses a fixed schema, but is also very easy to use since > it > > > has > > > > minimal requirements. It's really meant for demonstration purposes > more > > > > than anything else, but may be helpful just to get up and running. > > > > 2. Which JsonParser error? When processing a message fails, we need > to > > be > > > > careful about how we handle it. Currently it will not proceed if it > > can't > > > > process a message since for a lot of applications it isn't acceptable > > to > > > > drop messages. By default, we want at least once semantics, with > > exactly > > > > once as long as we don't encounter any crashes or network errors. > > Manual > > > > intervention is currently required in that case. > > > > > > > > -Ewen > > > > > > > > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju < > > > > venkatengineer...@gmail.com> wrote: > > > > > > > > > Hi Ewen, > > > > > > > > > > Thanks for the explanation. with your suggested setting, I was able > > to > > > > > start just a sink connector like below : > > > > > > > > > > >* bin/connect-standalone.sh config/connect-standalone.properties > > > > > config/connect-file-sink.properties* > > > > > > > > > > But I have a couple of issues yet, > > > > > 1) Since I am only testing a simple file sink connector, I am > > manually > > > > > producing some messages to the 'connect-test' kafka topic, where > the > > > > > sink-Task is reading from. And it works only if the message is > within > > > > > double-quotes. > > > > > 2) Once I hit the above JsonParser error on the SinkTask, the > > connector > > > > is > > > > > hung, doesn't take any more messages even proper ones. > > > > > > > > > > > > > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava < > > > > e...@confluent.io> > > > > > wrote: > > > > > > > > > > > Hi Venkatesh, > > > > > > > > > > > > If you're using the default settings included in the sample > > configs, > > > > > it'll > > > > > > expect JSON data in a special format to support passing schemas > > along > > > > > with > > > > > > the data. This is turned on by default because it makes it > possible > > > to > > > > > work > > > > > > with a *lot* more connectors and data storage systems (many > require > > > > > > schemas!), though it does mean consuming regular JSON data won't > > work > > > > out > > > > > > of the box. You can easily switch this off by changing these > lines > > in > > > > the > > > > > > worker config: > > > > > > > > > > > > key.converter.schemas.enable=true > > > > > > value.converter.schemas.enable=true > > > > > > > > > > > > to be false instead. However, note that this will only work with > > > > > connectors > > > > > > that can work with "schemaless" data. This wouldn't work for, > e.g., > > > > > writing > > > > > > Avro files in HDFS since they need schema information, but it > might > > > > work > > > > > > for other formats. This would allow you to consume JSON data from > > any > > > > > topic > > > > > > it already existed in. > > > > > > > > > > > > Note that JSON is not the only format you can use. You can also > > > > > substitute > > > > > > other
Re: kafka connect(copycat) question
I tried building copycat-hdfs but its not able to pull dependencies from maven... error trace : --- Failed to execute goal on project kafka-connect-hdfs: Could not resolve dependencies for project io.confluent:kafka-connect-hdfs:jar:2.0.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.kafka:connect-api:jar:0.9.0.0, io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT, io.confluent:common-config:jar:2.0.0-SNAPSHOT: Could not find artifact org.apache.kafka:connect-api:jar:0.9.0.0 in confluent On Thu, Nov 12, 2015 at 2:59 PM, Ewen Cheslack-Postavawrote: > Yes, though it's still awaiting some updates after some renaming and API > modifications that happened in Kafka recently. > > -Ewen > > On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju < > venkatengineer...@gmail.com> wrote: > > > Ewen, > > > > How do I use a HDFSSinkConnector. I see the sink as part of a confluent > > project ( > > > > > https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java > > ). > > Does it mean that I build this project and add the jar to kafka libs ? > > > > > > > > > > On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava < > e...@confluent.io> > > wrote: > > > > > Venkatesh, > > > > > > 1. It only works with quotes because the message needs to be parsed as > > JSON > > > -- a bare string without quotes is not valid JSON. If you're just > using a > > > file sink, you can also try the StringConverter, which only supports > > > strings and uses a fixed schema, but is also very easy to use since it > > has > > > minimal requirements. It's really meant for demonstration purposes more > > > than anything else, but may be helpful just to get up and running. > > > 2. Which JsonParser error? When processing a message fails, we need to > be > > > careful about how we handle it. Currently it will not proceed if it > can't > > > process a message since for a lot of applications it isn't acceptable > to > > > drop messages. By default, we want at least once semantics, with > exactly > > > once as long as we don't encounter any crashes or network errors. > Manual > > > intervention is currently required in that case. > > > > > > -Ewen > > > > > > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju < > > > venkatengineer...@gmail.com> wrote: > > > > > > > Hi Ewen, > > > > > > > > Thanks for the explanation. with your suggested setting, I was able > to > > > > start just a sink connector like below : > > > > > > > > >* bin/connect-standalone.sh config/connect-standalone.properties > > > > config/connect-file-sink.properties* > > > > > > > > But I have a couple of issues yet, > > > > 1) Since I am only testing a simple file sink connector, I am > manually > > > > producing some messages to the 'connect-test' kafka topic, where the > > > > sink-Task is reading from. And it works only if the message is within > > > > double-quotes. > > > > 2) Once I hit the above JsonParser error on the SinkTask, the > connector > > > is > > > > hung, doesn't take any more messages even proper ones. > > > > > > > > > > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava < > > > e...@confluent.io> > > > > wrote: > > > > > > > > > Hi Venkatesh, > > > > > > > > > > If you're using the default settings included in the sample > configs, > > > > it'll > > > > > expect JSON data in a special format to support passing schemas > along > > > > with > > > > > the data. This is turned on by default because it makes it possible > > to > > > > work > > > > > with a *lot* more connectors and data storage systems (many require > > > > > schemas!), though it does mean consuming regular JSON data won't > work > > > out > > > > > of the box. You can easily switch this off by changing these lines > in > > > the > > > > > worker config: > > > > > > > > > > key.converter.schemas.enable=true > > > > > value.converter.schemas.enable=true > > > > > > > > > > to be false instead. However, note that this will only work with > > > > connectors > > > > > that can work with "schemaless" data. This wouldn't work for, e.g., > > > > writing > > > > > Avro files in HDFS since they need schema information, but it might > > > work > > > > > for other formats. This would allow you to consume JSON data from > any > > > > topic > > > > > it already existed in. > > > > > > > > > > Note that JSON is not the only format you can use. You can also > > > > substitute > > > > > other implementations of the Converter interface. Confluent has > > > > implemented > > > > > an Avro version that works well with our schema registry ( > > > > > > > > > > > > > > > https://github.com/confluentinc/schema-registry/tree/master/avro-converter > > > > > ). > > > > > The JSON implementation made sense to add as the one included with > > > Kafka > > > > > simply because it didn't introduce any other dependencies that > > weren't > > > > > already in Kafka. It's also possible to write
Re: kafka connect(copycat) question
Yes, though it's still awaiting some updates after some renaming and API modifications that happened in Kafka recently. -Ewen On Thu, Nov 12, 2015 at 9:10 AM, Venkatesh Rudraraju < venkatengineer...@gmail.com> wrote: > Ewen, > > How do I use a HDFSSinkConnector. I see the sink as part of a confluent > project ( > > https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java > ). > Does it mean that I build this project and add the jar to kafka libs ? > > > > > On Tue, Nov 10, 2015 at 9:35 PM, Ewen Cheslack-Postava> wrote: > > > Venkatesh, > > > > 1. It only works with quotes because the message needs to be parsed as > JSON > > -- a bare string without quotes is not valid JSON. If you're just using a > > file sink, you can also try the StringConverter, which only supports > > strings and uses a fixed schema, but is also very easy to use since it > has > > minimal requirements. It's really meant for demonstration purposes more > > than anything else, but may be helpful just to get up and running. > > 2. Which JsonParser error? When processing a message fails, we need to be > > careful about how we handle it. Currently it will not proceed if it can't > > process a message since for a lot of applications it isn't acceptable to > > drop messages. By default, we want at least once semantics, with exactly > > once as long as we don't encounter any crashes or network errors. Manual > > intervention is currently required in that case. > > > > -Ewen > > > > On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju < > > venkatengineer...@gmail.com> wrote: > > > > > Hi Ewen, > > > > > > Thanks for the explanation. with your suggested setting, I was able to > > > start just a sink connector like below : > > > > > > >* bin/connect-standalone.sh config/connect-standalone.properties > > > config/connect-file-sink.properties* > > > > > > But I have a couple of issues yet, > > > 1) Since I am only testing a simple file sink connector, I am manually > > > producing some messages to the 'connect-test' kafka topic, where the > > > sink-Task is reading from. And it works only if the message is within > > > double-quotes. > > > 2) Once I hit the above JsonParser error on the SinkTask, the connector > > is > > > hung, doesn't take any more messages even proper ones. > > > > > > > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava < > > e...@confluent.io> > > > wrote: > > > > > > > Hi Venkatesh, > > > > > > > > If you're using the default settings included in the sample configs, > > > it'll > > > > expect JSON data in a special format to support passing schemas along > > > with > > > > the data. This is turned on by default because it makes it possible > to > > > work > > > > with a *lot* more connectors and data storage systems (many require > > > > schemas!), though it does mean consuming regular JSON data won't work > > out > > > > of the box. You can easily switch this off by changing these lines in > > the > > > > worker config: > > > > > > > > key.converter.schemas.enable=true > > > > value.converter.schemas.enable=true > > > > > > > > to be false instead. However, note that this will only work with > > > connectors > > > > that can work with "schemaless" data. This wouldn't work for, e.g., > > > writing > > > > Avro files in HDFS since they need schema information, but it might > > work > > > > for other formats. This would allow you to consume JSON data from any > > > topic > > > > it already existed in. > > > > > > > > Note that JSON is not the only format you can use. You can also > > > substitute > > > > other implementations of the Converter interface. Confluent has > > > implemented > > > > an Avro version that works well with our schema registry ( > > > > > > > > > > https://github.com/confluentinc/schema-registry/tree/master/avro-converter > > > > ). > > > > The JSON implementation made sense to add as the one included with > > Kafka > > > > simply because it didn't introduce any other dependencies that > weren't > > > > already in Kafka. It's also possible to write implementations for > other > > > > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and > > > > more), but I'm not aware of anyone who has started to tackle those > > > > converters yet. > > > > > > > > -Ewen > > > > > > > > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju < > > > > venkatengineer...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying out the new kakfa connect service. > > > > > > > > > > version : kafka_2.11-0.9.0.0 > > > > > mode: standalone > > > > > > > > > > I have a conceptual question on the service. > > > > > > > > > > Can I just start a sink connector which reads from Kafka and writes > > to > > > > say > > > > > HDFS ? > > > > > From what I have tried, it's expecting a source-connector as well > > > because > > > > > the sink-connector is expecting a particular pattern of the message > > in > >
Re: kafka connect(copycat) question
Hi Venkatesh, If you're using the default settings included in the sample configs, it'll expect JSON data in a special format to support passing schemas along with the data. This is turned on by default because it makes it possible to work with a *lot* more connectors and data storage systems (many require schemas!), though it does mean consuming regular JSON data won't work out of the box. You can easily switch this off by changing these lines in the worker config: key.converter.schemas.enable=true value.converter.schemas.enable=true to be false instead. However, note that this will only work with connectors that can work with "schemaless" data. This wouldn't work for, e.g., writing Avro files in HDFS since they need schema information, but it might work for other formats. This would allow you to consume JSON data from any topic it already existed in. Note that JSON is not the only format you can use. You can also substitute other implementations of the Converter interface. Confluent has implemented an Avro version that works well with our schema registry ( https://github.com/confluentinc/schema-registry/tree/master/avro-converter). The JSON implementation made sense to add as the one included with Kafka simply because it didn't introduce any other dependencies that weren't already in Kafka. It's also possible to write implementations for other formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and more), but I'm not aware of anyone who has started to tackle those converters yet. -Ewen On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju < venkatengineer...@gmail.com> wrote: > Hi, > > I am trying out the new kakfa connect service. > > version : kafka_2.11-0.9.0.0 > mode: standalone > > I have a conceptual question on the service. > > Can I just start a sink connector which reads from Kafka and writes to say > HDFS ? > From what I have tried, it's expecting a source-connector as well because > the sink-connector is expecting a particular pattern of the message in > kafka-topic. > > Thanks, > Venkat > -- Thanks, Ewen
Re: kafka connect(copycat) question
Hi Ewen, Thanks for the explanation. with your suggested setting, I was able to start just a sink connector like below : >* bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-sink.properties* But I have a couple of issues yet, 1) Since I am only testing a simple file sink connector, I am manually producing some messages to the 'connect-test' kafka topic, where the sink-Task is reading from. And it works only if the message is within double-quotes. 2) Once I hit the above JsonParser error on the SinkTask, the connector is hung, doesn't take any more messages even proper ones. On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postavawrote: > Hi Venkatesh, > > If you're using the default settings included in the sample configs, it'll > expect JSON data in a special format to support passing schemas along with > the data. This is turned on by default because it makes it possible to work > with a *lot* more connectors and data storage systems (many require > schemas!), though it does mean consuming regular JSON data won't work out > of the box. You can easily switch this off by changing these lines in the > worker config: > > key.converter.schemas.enable=true > value.converter.schemas.enable=true > > to be false instead. However, note that this will only work with connectors > that can work with "schemaless" data. This wouldn't work for, e.g., writing > Avro files in HDFS since they need schema information, but it might work > for other formats. This would allow you to consume JSON data from any topic > it already existed in. > > Note that JSON is not the only format you can use. You can also substitute > other implementations of the Converter interface. Confluent has implemented > an Avro version that works well with our schema registry ( > https://github.com/confluentinc/schema-registry/tree/master/avro-converter > ). > The JSON implementation made sense to add as the one included with Kafka > simply because it didn't introduce any other dependencies that weren't > already in Kafka. It's also possible to write implementations for other > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and > more), but I'm not aware of anyone who has started to tackle those > converters yet. > > -Ewen > > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju < > venkatengineer...@gmail.com> wrote: > > > Hi, > > > > I am trying out the new kakfa connect service. > > > > version : kafka_2.11-0.9.0.0 > > mode: standalone > > > > I have a conceptual question on the service. > > > > Can I just start a sink connector which reads from Kafka and writes to > say > > HDFS ? > > From what I have tried, it's expecting a source-connector as well because > > the sink-connector is expecting a particular pattern of the message in > > kafka-topic. > > > > Thanks, > > Venkat > > > > > > -- > Thanks, > Ewen > -- Victory awaits him who has everything in order--luck, people call it.
Re: kafka connect(copycat) question
Venkatesh, 1. It only works with quotes because the message needs to be parsed as JSON -- a bare string without quotes is not valid JSON. If you're just using a file sink, you can also try the StringConverter, which only supports strings and uses a fixed schema, but is also very easy to use since it has minimal requirements. It's really meant for demonstration purposes more than anything else, but may be helpful just to get up and running. 2. Which JsonParser error? When processing a message fails, we need to be careful about how we handle it. Currently it will not proceed if it can't process a message since for a lot of applications it isn't acceptable to drop messages. By default, we want at least once semantics, with exactly once as long as we don't encounter any crashes or network errors. Manual intervention is currently required in that case. -Ewen On Tue, Nov 10, 2015 at 8:58 PM, Venkatesh Rudraraju < venkatengineer...@gmail.com> wrote: > Hi Ewen, > > Thanks for the explanation. with your suggested setting, I was able to > start just a sink connector like below : > > >* bin/connect-standalone.sh config/connect-standalone.properties > config/connect-file-sink.properties* > > But I have a couple of issues yet, > 1) Since I am only testing a simple file sink connector, I am manually > producing some messages to the 'connect-test' kafka topic, where the > sink-Task is reading from. And it works only if the message is within > double-quotes. > 2) Once I hit the above JsonParser error on the SinkTask, the connector is > hung, doesn't take any more messages even proper ones. > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava> wrote: > > > Hi Venkatesh, > > > > If you're using the default settings included in the sample configs, > it'll > > expect JSON data in a special format to support passing schemas along > with > > the data. This is turned on by default because it makes it possible to > work > > with a *lot* more connectors and data storage systems (many require > > schemas!), though it does mean consuming regular JSON data won't work out > > of the box. You can easily switch this off by changing these lines in the > > worker config: > > > > key.converter.schemas.enable=true > > value.converter.schemas.enable=true > > > > to be false instead. However, note that this will only work with > connectors > > that can work with "schemaless" data. This wouldn't work for, e.g., > writing > > Avro files in HDFS since they need schema information, but it might work > > for other formats. This would allow you to consume JSON data from any > topic > > it already existed in. > > > > Note that JSON is not the only format you can use. You can also > substitute > > other implementations of the Converter interface. Confluent has > implemented > > an Avro version that works well with our schema registry ( > > > https://github.com/confluentinc/schema-registry/tree/master/avro-converter > > ). > > The JSON implementation made sense to add as the one included with Kafka > > simply because it didn't introduce any other dependencies that weren't > > already in Kafka. It's also possible to write implementations for other > > formats (e.g. Thrift, Protocol Buffers, Cap'n Proto, MessagePack, and > > more), but I'm not aware of anyone who has started to tackle those > > converters yet. > > > > -Ewen > > > > On Tue, Nov 10, 2015 at 1:23 PM, Venkatesh Rudraraju < > > venkatengineer...@gmail.com> wrote: > > > > > Hi, > > > > > > I am trying out the new kakfa connect service. > > > > > > version : kafka_2.11-0.9.0.0 > > > mode: standalone > > > > > > I have a conceptual question on the service. > > > > > > Can I just start a sink connector which reads from Kafka and writes to > > say > > > HDFS ? > > > From what I have tried, it's expecting a source-connector as well > because > > > the sink-connector is expecting a particular pattern of the message in > > > kafka-topic. > > > > > > Thanks, > > > Venkat > > > > > > > > > > > -- > > Thanks, > > Ewen > > > > > > -- > Victory awaits him who has everything in order--luck, people call it. > -- Thanks, Ewen