Slack digest for #general - 2019-11-05

Apache Pulsar Slack Tue, 05 Nov 2019 01:11:32 -0800

2019-11-04 09:40:10 UTC - Jasper Li: Halo all,

I have a question to use Pulsar SQL after doing cdc by debezium connector, it 
returns me an error as below:


```
2019-11-04T09:32:03.590Z        WARN    statement-response-2    
com.facebook.presto.server.ThrowableMapper      Request failed for 
/v1/statement/20191104_093201_00002_562v2/3
java.lang.IllegalArgumentException: Unsupported schema : {
  "name": "db.db.table",
  "schema": {
    "key": {
      "name": "Bytes",
      "schema": "",
      "type": "BYTES",
      "properties": {}
    },
    "value": {
      "name": "Bytes",
      "schema": "",
      "type": "BYTES",
      "properties": {}
    }
  },
  "properties": {
    "key.schema.name": "Bytes",
    "key.schema.properties": "{}",
    "key.schema.type": "BYTES",
    "kv.encoding.type": "INLINE",
    "value.schema.name": "Bytes",
    "value.schema.properties": "{}",
    "value.schema.type": "BYTES"
  }
}
```

Does it mean there is no way to query data from debezium cdc record by Pulsar 
SQL, unless I have transform it?

Thanks!!!!!
----
2019-11-04 09:48:12 UTC - Jasper Li: @xiaolong.ran Thank you very much!!! I am 
successful in offloading to my GCS bucket now!!! It is cool!!!
----
2019-11-04 10:09:52 UTC - Jasper Li: Halo again,

I have another question about debezium CDC in Pulsar compare with Kafka. Since 
I am moving from Kafka to Pulsar, I have used debezium to do cdc and get change 
log from mysql, but, I have used  
```transforms.unwrap.type=io.debezium.transforms.UnwrapFromEnvelope``` and 
```key.converter=io.confluent.connect.avro.AvroConverter/value.converter=io.confluent.connect.avro.AvroConverter```
 in Kafka connect, but I cannot use them in Pulsar IO directly, is it possible 
to apply them in Pulsar IO?

Thanks again!!!
----
2019-11-04 10:56:12 UTC - Kabeer Ahmed: @tuteng Thank you for tagging @yijie
----
2019-11-04 11:10:32 UTC - Sijie Guo: @Jasper Li:

for your first question, I think PrestoSQL doesn’t support key/value schema 
yet. @Penghui Li was looking into adding that support.

for the second question, I believe you can set those settings.  you can just 
add those debezium settings under `configs:` section in the yaml file you used 
to submit the connector.
+1 : Jasper Li, Penghui Li
----
2019-11-04 13:09:35 UTC - Berger: @Berger has joined the channel
----
2019-11-04 13:17:26 UTC - Berger: Good morning everyone. I’m looking for 
message solution which can be installed as multicluster between two different 
cloud providers at the start and add more clusters installed between different 
infrastructures across world :slightly_smiling_face:
I started with basic installation on k8s cluster, but here I start to think if 
it is possible to install pulsar on multiple k8s cluster and connect them 
together into one multi-cluster instance. Is it something what I can gain with 
the kubernetes, or I have to use normal instances, but still there i need 
probably any quorum to share configuration between clusters :open_mouth:
+1 : Jasper Li
----
2019-11-04 13:57:10 UTC - Matt Mitchell: I tried to run the latest dashboard 
(last week and tip of master as of 5 mins ago) and it failed with `sudo: 
initdb: command not found`. I found that the path to `initdb` in the 
`init-postgres.sh` script references `9.6` but the image has `11`. After 
updating the script, the dashboard seems to work. Is this a known issue or 
possibly something related to my env?
----
2019-11-04 14:08:56 UTC - Matt Mitchell: Also, once it’s running, I the UI 
doesn’t list anything… no tenants, namespaces etc.. maybe there’s a config/ENV 
option missing from the README?
----
2019-11-04 14:15:07 UTC - tuteng: You can try login in pg for query data?
----
2019-11-04 14:55:26 UTC - Sijie Guo: @Matt Mitchell the current dashboard uses 
topic stats for displaying tenants and namespaces. if you don’t have any 
traffic, those information might not show up. You can try use the new 
management console: <https://github.com/apache/pulsar-manager>
----
2019-11-04 14:57:12 UTC - Sijie Guo: you can install pulsar in multiple k8s 
clusters and expose the proxies through a load balancer. so each cluster can 
connect to the others. In this way, you can make a global instance.
----
2019-11-04 15:10:58 UTC - Berger: @Sijie Guo Did you seen any examples 
(articles or whatever) with such configuration? I guess in this case I can use 
this method to make connection between different installation types like k8s, 
normal installation on instances etc.
----
2019-11-04 15:18:38 UTC - Matt Mitchell: will do. thanks @Sijie Guo
----
2019-11-04 15:45:00 UTC - Alexandre DUVAL: @Jerry Peng do you have an example 
of function config yaml file?
----
2019-11-04 15:46:15 UTC - Alexandre DUVAL: Hi, how to inject env var to a 
pulsar function?
----
2019-11-04 15:50:34 UTC - Alex Rufo: @Alex Rufo has joined the channel
----
2019-11-04 15:53:20 UTC - Alexandre DUVAL: env vars should be passed in 
PULSAR_EXTRA_OPTS?
----
2019-11-04 16:03:16 UTC - Matteo Merli: @Jared Mackey @Raman Gupta the sequence 
id, a part for deduplication, is used to correlate a SendReceipt to a 
particular Send request. 

It’s not optional or ignored, but rather it’s stored within the message 
metadata. 

The sequence id is a per-producer client assigned identifier while the 
“message” is a storage assigned unique identifier 
----
2019-11-04 18:55:21 UTC - Addison Higham: reading the docs on retention 
policies, want to make sure I understand something. The docs say "and" for size 
and time, does that mean that if I set a policy for 10GB size and 3 hours of 
time, that it could go beyond 10GB if I have more than 10GB of data in the 3 
hour window?
----
2019-11-04 19:27:45 UTC - Jerry Peng: No it’s which limit is reached first
----
2019-11-04 19:45:06 UTC - Addison Higham: okay, that is what I thought was more 
likely the case (and maybe I missed it) but it wasn't obvious at first glance
----
2019-11-04 20:31:36 UTC - Jerry Peng: ```
name: jerry-function
tenant: public
namespace: default
jar: 
/Users/jerrypeng/workspace/incubator-pulsar/pulsar-functions/java-examples/target/pulsar-functions-api-examples.jar
className: org.apache.pulsar.functions.api.examples.TestFunction
inputSpecs:
  <persistent://jerry/default/jerry-input>:
    receiverQueueSize: 1000
output: <persistent://jerry/default/jerry-output>
parallelism: 1
cleanupSubscription: true
```
----
2019-11-04 21:17:20 UTC - Alexandre DUVAL: there is a way to define custom 
input schema?
----
2019-11-04 22:47:02 UTC - CTRL: @CTRL has joined the channel
----
2019-11-04 22:47:42 UTC - CTRL: hi everyone! :slightly_smiling_face:
wave : Chris Bartholomew, Matteo Merli, Karthik Ramasamy
----
2019-11-04 23:49:08 UTC - JJ: @JJ has joined the channel
----
2019-11-05 01:28:18 UTC - Jasper Li: Thanks for your reply!!! It is happy to 
know the PrestoSQL will support key/value schema in the future and I will try 
to set up my debezium again. :slightly_smiling_face:
----
2019-11-05 01:46:41 UTC - kay pan: @kay pan has joined the channel
----
2019-11-05 01:53:21 UTC - kay pan: hi everyone
----
2019-11-05 01:54:34 UTC - kay pan: i have a issue:  
[pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO  
org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 
sessionid:0x20043268f0f000b
----
2019-11-05 01:54:45 UTC - kay pan: please help ,thanks
----
2019-11-05 03:26:37 UTC - mrigesh: @mrigesh has joined the channel
----
2019-11-05 05:35:43 UTC - Gopi Krishna: 
<https://github.com/PharosProduction/tutorial-pulsar-java> so if we are writing 
java classes as in this link, how do we run the java classes of producers and 
consumers. I am confused
----
2019-11-05 06:52:01 UTC - Gopi Krishna: Are there any connectors by which we 
can stream data from mongodb to pulsar? I can find 
<https://pulsar.apache.org/docs/en/next/io-mongo-sink/> pulsar sink connector 
but not any connector to pull data from mongo
----
2019-11-05 06:52:27 UTC - Gopi Krishna: Are there any connectors by which we 
can stream data from mongodb to pulsar? I can find 
<https://pulsar.apache.org/docs/en/next/io-mongo-sink/> pulsar sink connector 
but not any connector to pull data from mongo
----
2019-11-05 06:52:48 UTC - Ali Ahmed: @Gopi Krishna You mean a  cdc for mongodb ?
----
2019-11-05 06:53:11 UTC - Gopi Krishna: what is a cdc ?
----
2019-11-05 06:53:52 UTC - Ali Ahmed: change data capture
----
2019-11-05 06:55:30 UTC - Gopi Krishna: Hmm, not exactly. Basically I am trying 
to read the data streamed into mongodb through nifi. This data can be 
historical or real-time
----
2019-11-05 06:58:09 UTC - Gopi Krishna: any idea ?
----
2019-11-05 07:16:30 UTC - tuteng: You can try 
<https://pulsar.apache.org/docs/en/next/io-debug/> to debug
----
2019-11-05 07:16:50 UTC - tuteng: 
<https://pulsar.apache.org/docs/en/next/io-debug/#debug-in-localrun-mode>
----
2019-11-05 07:19:54 UTC - Gopi Krishna: This is just for debugging of 
mongo-connector-sink
----
2019-11-05 07:23:58 UTC - tuteng: 
<https://github.com/apache/pulsar/issues/5474>  We haven't added mongo's cdc 
scene yet.
----
2019-11-05 07:24:14 UTC - tuteng: pull data from mongo
----
2019-11-05 07:24:55 UTC - Gopi Krishna: thanks will go through
----
2019-11-05 07:28:58 UTC - Sijie Guo: @tuteng: @Gopi Krishna is asking for a 
mongodb cdc.
----

Slack digest for #general - 2019-11-05

Reply via email to