Re: Hive Pulsar Integration

2019-04-29 Thread Slim Bouguerra
Hi Here is where you can add that logic where you want to send to a given topic. https://github.com/apache/hive/blob/98e2a3582d5d239de6744a016d4f481312c43df2/kafka-handler/src/java/org/apache/hadoop/hive/kafka/TransactionalKafkaWriter.java#L142 Keep in mind that you might have lot of topics thus

Re: Hive Pulsar Integration

2019-04-28 Thread PengHui Li
@Slim I have updated the google doc. I think I lost important information in the previous image. Sorry for that. Slim Bouguerra 于2019年4月26日周五 下午11:43写道: > Thanks i can see the document now. > If my understanding is correct, you want achieve the following: > A Hive User will submit a SQL query

Re: Hive Pulsar Integration

2019-04-26 Thread Slim Bouguerra
Thanks i can see the document now. If my understanding is correct, you want achieve the following: A Hive User will submit a SQL query like select * from pulsar_table where column_used_to_partition_pulsrar_topic = 'value' And you want to only scan the pulsar topic that match that filter. Assuming

Re: Hive Pulsar Integration

2019-04-26 Thread PengHui Li
@Slim I have copied the image to Google Docs and hope to work fine. https://docs.google.com/document/d/1K_JE_a47bu1I7va1GwUK36vdOKZGqGFWTt4qPuTRShg/edit?usp=sharing Slim Bouguerra 于2019年4月26日周五 上午12:13写道: > Hey sorry your image is not showing? Not sure why. > > On Wed, Apr 24, 2019 at 6:53 AM

Re: Hive Pulsar Integration

2019-04-25 Thread Slim Bouguerra
Hey sorry your image is not showing? Not sure why. On Wed, Apr 24, 2019 at 6:53 AM PengHui Li wrote: > Sorry for so long to reply, > > I drew a simple picture, hope can help for the question. > The main point is to reduce the read of messages from unnecessary topics > while read data from

Re: Hive Pulsar Integration

2019-04-24 Thread PengHui Li
Sorry for so long to reply, I drew a simple picture, hope can help for the question. The main point is to reduce the read of messages from unnecessary topics while read data from partitioned table of hive. [image: image.png] Slim Bouguerra 于2019年4月20日周六 上午12:16写道: > Hi am not sure am getting

Re: Hive Pulsar Integration

2019-04-19 Thread Slim Bouguerra
Hi am not sure am getting the question 100% Can you share a design doc or outline the big picture in your mind? FYI am not very familiar with Pulsar thus please account for that :D But let me point out that Hive does not have the notion of partitions for tables backed by storage handlers, that is

Re: Hive Pulsar Integration

2019-04-17 Thread PengHui Li
@Slim I want to use different pulsar topic to store data for different hive partition. Is there a way to do this, or does this idea make sense? Can you give me some advice? 李鹏辉gmail 于2019年4月15日周一 下午6:22写道: > I already have a simple implementation that can write data and query data. > I read

Re: Hive Pulsar Integration

2019-04-15 Thread 李鹏辉gmail
I already have a simple implementation that can write data and query data. I read the design document and implementation of kafka. There are some differences of table partition with what I think. I want hive table partition locations work with pulsar topics. Different table partitions correspond

Re: Hive Pulsar Integration

2019-04-13 Thread Jörn Franke
I think you need to develop a custom hiveserde + custom Hadoopinputformat + custom Hiveoutputformat > Am 12.04.2019 um 17:35 schrieb 李鹏辉gmail : > > Hi guys, > > I’m working on integration of hive and pulsar recently. But now i have > encountered some problems and hope to get help here. > >

Re: Hive Pulsar Integration

2019-04-13 Thread 李鹏辉gmail
Thank you so much. This is too much help for me. :) > 在 2019年4月12日,23:46,Slim Bouguerra 写道: > > Hi, Great to hear that you want to work on that! > We have done similar work for Kafka you can look at the code and design doc > it will help guiding for Pulsar integration. >

Re: Hive Pulsar Integration

2019-04-12 Thread Slim Bouguerra
Hi, Great to hear that you want to work on that! We have done similar work for Kafka you can look at the code and design doc it will help guiding for Pulsar integration. https://github.com/apache/hive/tree/master/kafka-handler

Hive Pulsar Integration

2019-04-12 Thread 李鹏辉gmail
Hi guys, I’m working on integration of hive and pulsar recently. But now i have encountered some problems and hope to get help here. First of all, i simply describe the motivation. Pulsar can be used as infinite streams for keeping both historic data and streaming data, So we want to use