2019-05-08 10:03:11 UTC - James Creedy: I'm trying to get the go client working. Not primarly a go engineer so might be something go related.
brew list libpulsar /usr/local/Cellar/libpulsar/2.3.1/include/pulsar/ (41 files) /usr/local/Cellar/libpulsar/2.3.1/lib/libpulsar.2.3.1.dylib /usr/local/Cellar/libpulsar/2.3.1/lib/ (2 other files) dep init Using ^2.3.1 as constraint for direct dep <http://github.com/apache/pulsar|github.com/apache/pulsar> Locking in v2.3.1 (7a6f2cc) for direct dep <http://github.com/apache/pulsar|github.com/apache/pulsar> Locking in v2.1 (a96e638) for transitive dep <http://gopkg.in/natefinch/lumberjack.v2|gopkg.in/natefinch/lumberjack.v2> Locking in v1.4.1 (8bdbc7b) for transitive dep <http://github.com/sirupsen/logrus|github.com/sirupsen/logrus> Locking in v1.0.2 (f55edac) for transitive dep <http://github.com/konsorten/go-windows-terminal-sequences|github.com/konsorten/go-windows-terminal-sequences> Locking in master (ecd444e) for transitive dep <http://golang.org/x/sys|golang.org/x/sys> go run main.go # <http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar> ../github.com/apache/pulsar/pulsar-client-go/pulsar/c_consumer.go:104:3: could not determine kind of name for C.pulsar_configure_set_negative_ack_redelivery_delay_ms ../github.com/apache/pulsar/pulsar-client-go/pulsar/c_consumer.go:133:4: could not determine kind of name for C.pulsar_consumer_configuration_set_schema_info ../github.com/apache/pulsar/pulsar-client-go/pulsar/c_consumer.go:313:2: could not determine kind of name for C.pulsar_consumer_negative_acknowledge ../github.com/apache/pulsar/pulsar-client-go/pulsar/c_consumer.go:318:2: could not determine kind of name for C.pulsar_consumer_negative_acknowledge_id ../github.com/apache/pulsar/pulsar-client-go/pulsar/c_consumer.go:133:58: could not determine kind of name for C.pulsar_schema_type ---- 2019-05-08 10:03:56 UTC - James Creedy: Seems like there is some mismatch between the CPP and go wrapper. ---- 2019-05-08 15:29:00 UTC - Grant Wu: What version of the C++ client do you have installed? ---- 2019-05-08 15:29:07 UTC - Grant Wu: iirc making sure they’re matching is fully manual at this point ---- 2019-05-08 18:09:17 UTC - Ryan Samo: Hey guys, we have a Spark job that uses the PulsarKafkaProducer to produce messages. Our problem is that if we take a Broker node down for patching etc., the spark job is not attempting to reconnect to another Broker which is up and waiting for connections, instead, the jobs just sit there and attempt a connection retry on the Broker they were originally connected to. Is there a way to break this retry behavior and have them immediately find another broker? Here is the exception we see over and over with increasing times on the retries. Adding 5 seconds on each retry... 2019/05/07 13:26:14 WARN ConnectionHandler pulsar-client-io-1-1: Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused:127.0.0.1:16651 -- Will try again in 11.638 s 2019/05/07 13:26:18 ERROR PulsarWriter Executor task launch worker for task 2002016: Error writing Pulsar java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at com.pulsar.PulsarWriter.writePartition(PulsarWriter.java:56) at org.apachient.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1954) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1954) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019/05/07 13:26:23 ERROR PulsarWriter Executor task launch worker for task 2002016: Error writing Kafka java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at com.pulsar.PulsarWriter.writePartition(PulsarWriter.java:56) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1954) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1954) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ---- 2019-05-08 18:11:28 UTC - Matteo Merli: @Ryan Samo What is the “serviceUrl” set in the client? ---- 2019-05-08 18:11:41 UTC - Matteo Merli: that’s what gets used for service discovery ---- 2019-05-08 18:13:13 UTC - Ryan Samo: It is a DNS entry that has 6 brokers behind it ---- 2019-05-08 18:15:48 UTC - Matteo Merli: Ok, that should work. Is there any longer timeout on the DNS? The client behavior should be to try all the IPs returned in the DNS record, in sequence. Can you share the complete client logs when the reconnections are attempted? ---- 2019-05-08 18:16:10 UTC - Matteo Merli: Or tcddump on port 53 and port 6650 ---- 2019-05-08 18:16:37 UTC - Ryan Samo: Let me see what I can come up with, thanks for the help ---- 2019-05-08 18:17:04 UTC - Ryan Samo: Just the Spark stuff has this issue by the way, normal clients have been fine ---- 2019-05-08 18:19:41 UTC - Matteo Merli: Under the hood it would always be the same code :slightly_smiling_face: ---- 2019-05-08 18:20:23 UTC - Ryan Samo: Yeah that’s what I saw too. The DNS is set to standard timeout too ---- 2019-05-08 18:21:32 UTC - Ryan Samo: What about maxNumberOfRejectedRequestPerConnection? Would that come into play here? ---- 2019-05-08 18:22:01 UTC - Matteo Merli: No, that would just a be a way to slow down reconnections ---- 2019-05-08 18:22:19 UTC - Ryan Samo: Ok thanks ---- 2019-05-08 18:31:09 UTC - Ryan Samo: I do see that they are using the bootstrap.servers command in Spark and setting it to the DNS, then handing that to Pulsar client. I wonder if it’s caching it somehow? ---- 2019-05-08 18:40:31 UTC - Matteo Merli: That shouldn’t be resolved at that point. That should be passed down as “serviceUrl” to client library ---- 2019-05-08 18:41:33 UTC - Ryan Samo: Right... I bet this is their problem. Thanks ---- 2019-05-08 18:54:27 UTC - Matteo Merli: Check if the string is converted into an `InetAddress` at any point ---- 2019-05-08 18:54:54 UTC - Matteo Merli: that would trigger the DNS resolution and it would stick with that ---- 2019-05-08 18:55:52 UTC - Ryan Samo: Ok will do ---- 2019-05-08 19:00:20 UTC - Ryan Samo: Yup, you’re right. That’s exactly what’s going on. Thanks dude! ---- 2019-05-08 19:19:03 UTC - Matteo Merli: Did you get any exception on broker side? ---- 2019-05-08 19:19:22 UTC - Alexandre DUVAL: @Matteo Merli that the only log of error on broker ---- 2019-05-08 19:19:35 UTC - Alexandre DUVAL: no stacktrace ---- 2019-05-08 19:20:48 UTC - Matteo Merli: :confused: ---- 2019-05-08 19:20:58 UTC - Matteo Merli: there should be a log.error line before the `throw new RestException(Status.INTERNAL_SERVER_ERROR, e.getMessage());` ---- 2019-05-08 19:22:13 UTC - Matteo Merli: So is this simply updating the schema with auth on? ---- 2019-05-08 19:23:05 UTC - Alexandre DUVAL: yes ---- 2019-05-08 19:23:15 UTC - Alexandre DUVAL: I can ensure you there is no stacktrace on broker ---- 2019-05-08 19:23:39 UTC - Alexandre DUVAL: ````14:41:16.961 [ForkJoinPool.commonPool-worker-3] INFO org.eclipse.jetty.server.RequestLog - 192.168.10.48 - - [07/May/2019:14:41:16 +0000] "POST /admin/v2/schemas/clevercloud/logs/full/schema HTTP/1.1" 500 386 "-" "PostmanRuntime/7.11.0" 17 14:41:28.654 [ForkJoinPool.commonPool-worker-2] INFO org.eclipse.jetty.server.RequestLog - 192.168.10.48 - - [07/May/2019:14:41:28 +0000] "POST /admin/v2/schemas/clevercloud/logs/full/schema HTTP/1.1" 500 386 "-" "PostmanRuntime/7.11.0" 13 14:41:29.009 [ForkJoinPool.commonPool-worker-2] INFO org.eclipse.jetty.server.RequestLog - 192.168.10.48 - - [07/May/2019:14:41:28 +0000] "POST /admin/v2/schemas/clevercloud/logs/full/schema HTTP/1.1" 500 386 "-" "PostmanRuntime/7.11.0" 14 14:41:29.353 [ForkJoinPool.commonPool-worker-2] INFO org.eclipse.jetty.server.RequestLog - 192.168.10.48 - - [07/May/2019:14:41:29 +0000] "POST /admin/v2/schemas/clevercloud/logs/full/schema HTTP/1.1" 500 386 "-" "PostmanRuntime/7.11.0" 15 ```` ---- 2019-05-08 19:23:42 UTC - Alexandre DUVAL: for 4 requests ---- 2019-05-08 19:23:52 UTC - Alexandre DUVAL: (no log between each line) ---- 2019-05-08 19:24:21 UTC - Matteo Merli: Ok, that should be easy to reproduce ---- 2019-05-08 19:24:36 UTC - Matteo Merli: Could it be any error in the POST body ? ---- 2019-05-08 19:25:02 UTC - Alexandre DUVAL: it could ---- 2019-05-08 19:25:33 UTC - Alexandre DUVAL: the body: ```{ "type":"AVRO", "schema":"{\"type\":\"record\",\"namespace\":\"com.clevercloud\",\"name\":\"ApplicationAddonLog\",\"fields\":[{\"name\":\"@source\",\"type\":\"string\"},{\"name\":\"@source_host\",\"type\":\"string\"},{\"name\":\"@timestamp\",\"type\":\"string\"},{\"name\":\"@version\",\"type\":\"string\"},{\"name\":\"ownerId\",\"type\":\"string\"},{\"name\":\"appId\",\"type\":\"string\"},{\"name\":\"deploymentId\",\"type\":[\"null\",\"string\"]},{\"name\":\"instanceId\",\"type\":\"string\"},{\"name\":\"zone\",\"type\":\"string\"},{\"name\":\"type\",\"type\":\"string\"},{\"name\":\"tags\",\"type\":\"array\",\"items\":\"string\"},{\"name\":\"syslog_pri\",\"type\":\"string\"},{\"name\":\"syslog_program\",\"type\":\"string\"},{\"name\":\"syslog_severity_code\",\"type\":\"double\"},{\"name\":\"syslog_facility_code\",\"type\":\"double\"},{\"name\":\"syslog_facility\",\"type\":\"string\"},{\"name\":\"syslog_severity\",\"type\":\"string\"},{\"name\":\"syslog_program\",\"type\":\"string\"},{\"name\":\"message\",\"type\":\"string\"}]}" } ``` ---- 2019-05-08 19:26:09 UTC - Alexandre DUVAL: But when i got error on the body, the response said it ---- 2019-05-08 19:26:23 UTC - Matteo Merli: Ok, got it ---- 2019-05-08 19:26:26 UTC - Alexandre DUVAL: like name wasnt specified or something like that ---- 2019-05-08 19:26:52 UTC - Matteo Merli: Can you try adding the `log.error` lines before rethrowing thexception ? ---- 2019-05-08 19:27:40 UTC - Matteo Merli: These should always be there whenever we rethrow the 500 errors, otherwise it’s very difficult to figure out the issue ---- 2019-05-08 19:27:55 UTC - Ryan Samo: So BOOTSTRAP_SERVERS_CONFIG is being set for the PulsarKafkaProducer to our DNS. Looking at the Pulsar code, I don’t think that’s being cached ---- 2019-05-08 19:28:37 UTC - Ryan Samo: But your code only targets localhost:6550 too, ours has many brokers behind it ---- 2019-05-08 19:28:48 UTC - Matteo Merli: Correct, it’s passed down as a string ---- 2019-05-08 19:28:56 UTC - Alexandre DUVAL: you mean put logs error lines as classical debug? I thought it was a bug or permission issue but ok I will do it tomorrow and keep you informed. ---- 2019-05-08 19:29:31 UTC - Ryan Samo: So if we pass our DNS into that, it should pickup all the servers behind it right? Not just the 1st ---- 2019-05-08 19:29:40 UTC - Matteo Merli: Yes, we should always print the error log there ---- 2019-05-08 19:29:54 UTC - Matteo Merli: Correct ---- 2019-05-08 19:30:31 UTC - Matteo Merli: The client should try all the returned IPs in sequence ---- 2019-05-08 19:31:08 UTC - Matteo Merli: (Checking if there’s any debug logs to turn on there) ---- 2019-05-08 19:31:42 UTC - Ryan Samo: Weird and so in this case if it was writing to Broker 1 and we pull the plug, you’d expect it to move to Broker 2 etc. ---- 2019-05-08 19:31:45 UTC - Ryan Samo: Ok thanks ---- 2019-05-08 19:34:43 UTC - Matteo Merli: Uhm.. not seeing easy way. A capture on ports 53 and 6650 would be more helpful ---- 2019-05-08 19:35:31 UTC - Ryan Samo: From the client side? ---- 2019-05-08 19:35:38 UTC - Matteo Merli: Yes ---- 2019-05-08 19:35:46 UTC - Ryan Samo: K thanks ---- 2019-05-08 19:35:53 UTC - Matteo Merli: to see what the DNS response looks like and where the client attempts to connect ---- 2019-05-08 19:36:07 UTC - Ryan Samo: Makes sense ---- 2019-05-08 19:37:01 UTC - Alexandre DUVAL: and about the 401 on nodes whish have not the topic? ---- 2019-05-08 19:37:42 UTC - Matteo Merli: that doesn’t look good either.. ---- 2019-05-08 21:20:12 UTC - Alexandre DUVAL: just tested, got 202 for the first time i define schema, and 500 on update with a totally different schema. There is backward compat to have on schema or something like that? ---- 2019-05-08 21:20:14 UTC - Alexandre DUVAL: @Matteo Merli ---- 2019-05-08 21:26:58 UTC - Alexandre DUVAL: even with backward compat (i.e. a new field, it's getting a 500) ---- 2019-05-08 21:29:58 UTC - Alexandre DUVAL: oh compat problem should return IncompatibleSchemaException ---- 2019-05-09 02:29:55 UTC - William Hammond: @William Hammond has joined the channel ---- 2019-05-09 03:25:54 UTC - Andrew Ernst: @Andrew Ernst has joined the channel ---- 2019-05-09 07:28:10 UTC - Bhavya Jindal: @Bhavya Jindal has joined the channel ---- 2019-05-09 08:13:18 UTC - Shivji Kumar Jha: @Sijie Guo Can you please merge <https://github.com/apache/pulsar/pull/4232>? :slightly_smiling_face: ---- 2019-05-09 08:32:25 UTC - jia zhai: @Shivji Kumar Jha merged it ---- 2019-05-09 08:32:48 UTC - Shivji Kumar Jha: Thanks @jia zhai :blush: ---- 2019-05-09 08:33:28 UTC - jia zhai: welcome, Thanks for the great work. ---- 2019-05-09 08:33:32 UTC - jia zhai: :slightly_smiling_face: ---- 2019-05-09 09:08:49 UTC - Alexandre DUVAL: How to set Always compatible strategy ? Backward, forward and full work but not this one: ```bin/pulsar-admin namespaces set-schema-autoupdate-strategy -c "AlwaysCompatible" yo/logs null Reason: Either --compatibility or --disabled must be specified ``` ----
