Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung


-- Forwarded message -

We are pleased to announce that ApacheCon @Home will be held online,
September 29 through October 1.

More event details are available at https://apachecon.com/acah2020 but
there’s a few things that I want to highlight for you, the members.

Yes, the CFP has been reopened. It will be open until the morning of
July 13th. With no restrictions on space/time at the venue, we can
accept talks from a much wider pool of speakers, so we look forward to
hearing from those of you who may have been reluctant, or unwilling, to
travel to the US.
Yes, you can add your project to the event, whether that’s one talk, or
an entire track - we have the room now. Those of you who are PMC members
will be receiving information about how to get your projects represented
at the event.
Attendance is free, as has been the trend in these events in our
industry. We do, however, offer donation options for attendees who feel
that our content is worth paying for.
Sponsorship opportunities are available immediately at
https://www.apachecon.com/acna2020/sponsors.html

If you would like to volunteer to help, we ask that you join the
plann...@apachecon.com mailing list and discuss 
it there, rather than
here, so that we do not have a split discussion, while we’re trying to
coordinate all of the things we have to get done in this very short time
window.

Rich Bowen,
VP Conferences, The Apache Software Foundation




Re: New Controller APIs - Pinot Issue #5390

2020-07-01 Thread Guruguha Marur Sreenivasa
Hi all,
I have updated the document with a sample response. Please share
your feedback.


-Guru

On Mon, Jun 22, 2020 at 4:32 PM Guruguha Marur Sreenivasa <
ms.gurug...@gmail.com> wrote:

> Hi all,
>
> The goal of this task is to add 2 new APIs to the controller. One is to
> get the segment reload status and the other is to get the segment metadata
> (for all segments of a table) from servers themselves which includes
> indexing information along with other segment metadata.
>
> The document for this issue is here
> .
> Please provide your feedback if any.
>
>
> Thanks,
> Guruguha
>


Apache Pinot Daily Email Digest (2020-07-01)

2020-07-01 Thread Pinot Slack Email Digest
#general@ejankowski_pinot: 
@ejankowski_pinot has joined the channel@damianoporta: 
Hello, i am going to pay for two 

 cloud servers to set up the cluster as explained in the doc (Thanks 
@npawar). I have a doubt, they offer vCPU like most of cloud services. Could it 
be good? has anyone ever used their service?@aruncthomas: 
@aruncthomas has joined the channel@pyne.suvodeep: 
@pyne.suvodeep has joined the 
channel#random@ejankowski_pinot: 
@ejankowski_pinot has joined the channel@aruncthomas: 
@aruncthomas has joined the channel@pyne.suvodeep: 
@pyne.suvodeep has joined the 
channel#troubleshooting@somanshu.jindal: 
@somanshu.jindal has joined the channel@quietgolfer: 
I'm having issues with slow queries.  I recently started moving away 
from the built in time columns to my own floored to utc_date.  Now my queries 
are taking 5 seconds over 80 mil rows (a lot slower than before).

 I removed some sensitive parts.
```metrics_offline_table_config.json: |-
{
  "tableName": "metrics",
  "tableType":"OFFLINE",
  "segmentsConfig" : {
"schemaName" : "metrics",
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "1461",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"replication" : "1"
  },
  "tableIndexConfig" : {
"loadMode"  : "MMAP",
"noDictionaryColumns": ["impressions"],
"starTreeIndexConfigs": [
  {
"dimensionsSplitOrder": [
  "utc_date",
  "platform_id",
  "account_id",
  "campaign_id"
],
"skipStarNodeCreationForDimensions": [
],
"functionColumnPairs": [
  "SUM__impressions",
]
  }
]
  },
  "tenants" : {},
  "metadata": {
"customConfigs": {}
  }
}```
The query I'm running looks pretty basic.  It's asking for aggregate stats at a 
high-level.  In my data, there are 8 unique utc_dates and 1 unique platform.
```select utc_date, sum(impressions) from metrics where platform_id = 13 group 
by utc_date```
Recent changes:
• switched from timestamp to my own utc_date (long).
• added `"noDictionaryColumns": ["impressions"],`
This previously was 50ms-100ms.

I'm going to bed now.  No need to rush an answer.@damianoporta: 
Hello everybody! I need support to set up a cluster. I followed the 
instructions explained by @npawar in her video. Everything works as expected 
but i did that test locally.
Now, I should organize all the components inside a real cluster that has 2/3 
servers. I need to understand how to organize the components for an 
high-availability architecture.
Obviously, i am talking about a very small cluster, so take "high-availability" 
with a grain of salt :)
My doubt is regarding the distribution of the components over the servers.
For example, seeing the video, the Zookeeper instance is just one, we start it 
with `pinot-admin.sh StartZookeeper -zkPort 2181` so the first question is: 
what about if the server with Zookeeper goes down? Can we share two or more 
zookeeper instances over multiple servers? Supposing we can  create multiple 
zookeeper instances does every machine should also have its own Controller, 
Broker and Server components? Because having more than one broker/controller on 
the same machine does not have much sense to me, maybe for very high traffic? 
Could someone explain it a little bit more? Thanks.@quietgolfer: 
I'm guessing my latency issue is related to a lack of disk.  The 
ingestion job still succeeded as successful even though I ran into disk issues 
on my pinot-server.@g.kishore: @quietgolfer can you paste 
the response stats@g.kishore: ingestion job will succeed 
as long as the data gets uploaded via controller api and stored in deep 
store@g.kishore: servers can pick it up any 
time@quietgolfer: Interesting.  Is there a way to force 
the servers to pick it up again after it failed to process internally?  I just 
increased disk and tried again and it worked.@g.kishore: 
yes, thats the way its supposed to work@g.kishore: 
restart will work@g.kishore: or a reset command 
for the segment in ERROR state@quietgolfer: Cool, 
ty@g.kishore: so the latency was also related to 
this?@cinto: @cinto has joined the 
channel@cinto: Hi Team,
I just installed Pinot locally and I ran the script
```./bin/quick-start-batch.sh```
It is throwing an error:
```* Offline quickstart setup complete *
Total