#general


@lorinlee1996: @lorinlee1996 has joined the channel
@linkedcarbon: @linkedcarbon has joined the channel
@varun.mukundhan: @varun.mukundhan has joined the channel
@varun.mukundhan640: @varun.mukundhan640 has joined the channel
@varun.mukundhan640: Hi folks, I am new to pinot so would like to apologize in advance for the noob questions that be incoming: 1. Is there any major performance difference between using PQL and SQL? For example, we have a usecase where we need top X aggregations. I can do this through `top X using` PQL and `order by <aggregation> desc LIMIT X` using SQL. Which one do you reccomend? 2. What are the differences between metrics and dimensions? I could see aggregation queries are allowed on non-string dimensions as well
  @richard892: hi Varun, 1. PQL is in the process of being deprecated, so don't start using it. There should be no performance problems with SQL. 2. Dimensions are things you might group by, but metrics are things you would aggregate. So in `select type,name,sum(value) from table group by type, name` type and name would be dimensions and value would be a metric. They are stored differently too, metrics don't have dictionaries, which makes them efficient to do things like sums over, and dimensions generally do have dictionaries, which generally compresses the data and makes group bys easier.
  @varun.mukundhan640: Thanks so much Richard! So `order by <aggregation> desc LIMIT X` is as efficient as using `TOP X` from PQL?
  @richard892: same query engine -> same performance
@samkiller: @samkiller has joined the channel
@tonya: @tonya has joined the channel
@mourad.dlia: Hi team, We want to paginate over a table but the offset keep changing due to new coming events. Is there a way to ignore new events during pagination?
  @walterddr: you have to do so by adding in a reasonable timecolumn filter E.g. something like`WHERE timeColumn <= NOW() - 10s`
  @walterddr: the -10s is to avoid late arrival kafka messages affect the result coming back. also you could ask questions in <#C011C9JHN7R|troubleshooting> channel, which will expose to more developers.
  @mourad.dlia: Ok thanks for you response.
  @walterddr: FYI Also noted that if you issue multiple queries against Pinot it doesnt guarantee the result set is in the same order. You will also have to issue an order by Clause (I assumed you are already doing so but just in case you weren't)
@ahsen.m: @ahsen.m has joined the channel
@abhinav.wagle1: Hi, I am a little confused on the `Note` mentioned here : ```NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system. Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.```
  @abhinav.wagle1: For aws the reco is go with `gp2` ?
  @abhinav.wagle1: And no S3 ?
  @abhinav.wagle1: Any particular reason why?
  @bagi.priyank: I think it refers to the storage for completed segments and not the segment store.
  @mayanks: Deepstore should be S3 on AWS. For local attached disk on serving nodes, the recommendation is to use EBS
  @ahsen.m: what about digital ocean, im on D/O
@revathibalakr: @revathibalakr has joined the channel
@nizar.hejazi: Hi team, one of the requirements for supporting stream ingestion w/ upsert is to . What if the input stream partition key (e.g. company id) is different from the record primary key (e.g. employee id)?
  @tisantos: @nizar.hejazi can you elaborate more on your "record primary key"? Are you referring to the column used as the sorted index in your Pinot table?
  @nizar.hejazi: No. I am referring to the key defined in in table schema.
  @g.kishore: Hi Nizar, you will have to repartition the input stream according to the primary key (employee id in this case)

#random


@lorinlee1996: @lorinlee1996 has joined the channel
@linkedcarbon: @linkedcarbon has joined the channel
@varun.mukundhan: @varun.mukundhan has joined the channel
@varun.mukundhan640: @varun.mukundhan640 has joined the channel
@samkiller: @samkiller has joined the channel
@tonya: @tonya has joined the channel
@ahsen.m: @ahsen.m has joined the channel
@revathibalakr: @revathibalakr has joined the channel

#feat-presto-connector


@samkiller: @samkiller has joined the channel

#feat-upsert


@samkiller: @samkiller has joined the channel

#qps-metric


@samkiller: @samkiller has joined the channel

#feat-better-schema-evolution


@samkiller: @samkiller has joined the channel

#fraud


@samkiller: @samkiller has joined the channel

#inconsistent-segment


@samkiller: @samkiller has joined the channel

#pinot-power-bi


@samkiller: @samkiller has joined the channel

#pinot-website


@samkiller: @samkiller has joined the channel

#minion-star-tree


@samkiller: @samkiller has joined the channel

#troubleshooting


@lorinlee1996: @lorinlee1996 has joined the channel
@weili99: Hi, I am setting up pinot in AWS EKS, The clusters are successfully set up in EKS. However, when I try to create schema and load data (Sec 3.4 in this doc ) by running this script:`kubectl apply -f pinot/pinot-realtime-quickstart.yml` I see the job are created but not running.
  @weili99: ``` kubectl get job/pinot-realtime-quickstart-pinot-table-creation -n pinot-quickstart NAME COMPLETIONS DURATION AGE pinot-realtime-quickstart-pinot-table-creation 0/1 59m 59m```
  @weili99: any pointers how to debug? I suspect the yaml file may broken? `pinot/pinot-realtime-quickstart.yml`
  @walterddr: could you share the log of that container?
  @walterddr: also how did you start the kafka pods?
@linkedcarbon: @linkedcarbon has joined the channel
@varun.mukundhan: @varun.mukundhan has joined the channel
@varun.mukundhan640: @varun.mukundhan640 has joined the channel
@samkiller: @samkiller has joined the channel
@tonya: @tonya has joined the channel
@ahsen.m: @ahsen.m has joined the channel
@ahsen.m: hello, is there any tutorial connecting pinot with mongodb ?
  @g.kishore: there isn't one but the right way to do that would be MongoDB -> Kafka -> Pinot
  @ahsen.m: thank you will look into that.
  @mayanks: In case you are looking for CDC, then there’s this article:
  @ahsen.m: so i was thinking to just use and send mongo data to kafka topics and get those in pinot.. do i need Debezium ?
  @g.kishore: No.. but we might need a decoder to parse the mongo db change log event in kafka.. if you can share sample events, we can write a decoder quickly or you can contribute it to Pinot
  @ahsen.m: in kafka connector i’m using ``` key.converter: org.apache.kafka.connect.json.JsonConverter```
@revathibalakr: @revathibalakr has joined the channel

#pinot-s3


@samkiller: @samkiller has joined the channel

#pinot-k8s-operator


@samkiller: @samkiller has joined the channel

#onboarding


@samkiller: @samkiller has joined the channel

#feat-geo-spatial-index


@samkiller: @samkiller has joined the channel

#custom-aggregators


@samkiller: @samkiller has joined the channel

#inconsistent-perf


@samkiller: @samkiller has joined the channel

#aggregators


@samkiller: @samkiller has joined the channel

#query-latency


@samkiller: @samkiller has joined the channel

#dhill-date-seg


@samkiller: @samkiller has joined the channel

#enable-generic-offsets


@samkiller: @samkiller has joined the channel

#pinot-dev


@samkiller: @samkiller has joined the channel

#community


@samkiller: @samkiller has joined the channel

#announcements


@samkiller: @samkiller has joined the channel

#s3-multiple-buckets


@samkiller: @samkiller has joined the channel

#multiple_streams


@samkiller: @samkiller has joined the channel

#presto-pinot-connector


@samkiller: @samkiller has joined the channel

#latency-during-segment-commit


@samkiller: @samkiller has joined the channel

#pinot-realtime-table-rebalance


@samkiller: @samkiller has joined the channel

#new-office-space


@samkiller: @samkiller has joined the channel

#config-tuner


@samkiller: @samkiller has joined the channel

#getting-started


@octchristmas: Hi. team. I'm looking at this article to check the pre-aggregation feature. How do i verify that metrics is pre-aggregated.
  @jadami: i’ve done it 2 ways 1. look for in your logs. but in general search for `Metrics aggregation` because there are several other logs that will tell you why it wasn’t able to be turned on that are not immediately apparent (like not specifying noDictionaryColumns) 2. run a query like `SELECT $segmentName, all_time_columns, all_dimension_columns, count(*) from your_table group by $segmentName, all_time_columns, all_dimensions_columns having count(*) > 1` . that should return nothing since everything in the segment is aggregated
@linkedcarbon: @linkedcarbon has joined the channel
@varun.mukundhan: @varun.mukundhan has joined the channel
@varun.mukundhan640: @varun.mukundhan640 has joined the channel
@samkiller: @samkiller has joined the channel
@tonya: @tonya has joined the channel
@ahsen.m: @ahsen.m has joined the channel
@revathibalakr: @revathibalakr has joined the channel

#feat-partial-upsert


@samkiller: @samkiller has joined the channel

#metrics-plugin-impl


@samkiller: @samkiller has joined the channel

#debug_upsert


@samkiller: @samkiller has joined the channel

#flink-pinot-connector


@samkiller: @samkiller has joined the channel

#minion-improvements


@samkiller: @samkiller has joined the channel

#fix-numerical-predicate


@samkiller: @samkiller has joined the channel

#complex-type-support


@samkiller: @samkiller has joined the channel

#product-launch


@samkiller: @samkiller has joined the channel

#pinot-trino


@samkiller: @samkiller has joined the channel

#kinesis_help


@samkiller: @samkiller has joined the channel

#udf-type-matching


@samkiller: @samkiller has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to