Apache iceberg and hive comparison as datasource for kylin

2024-05-08 Thread Nam Đỗ Duy via user
Dear Dev Team

I read this blog and it seems to be more interesting that iceberg has more
features than hive

[image: image.png]

Do you support using iceberg as data source for kylin?

Hive vs Iceberg: When to migrate to Apache Iceberg (starburst.io)


Thank you very much


Kylin5 metadata from PostgreSQL database problem

2024-05-02 Thread i...@robozor.cz
I would like to set Kylin 5 Standalone (Docker Version) so that the source
data for the models takes from PostgreSQL and not from Hive.

I downloaded the JDBC driver from
https://jdbc.postgresql.org/download/postgresql-42.7.3.jar and saved in
$SQOOP_HOME/lib and to $KYLIN_HOME/ext (according from
https://www.progress.com/tutorials/jdbc/import-data-from-salesforce-and-othe
rs-to-apache-kylin-using-jdbc)

 

I set up in Kylin.Properties:

 

kylin.metadata.url=kylin_metadata@jdbc,driverClassName=org.postgresql.Driver
,url=

"jdbc:postgresql://dwh-db-prod:5432/dwh",username=mylogin,password=mypasswor
d

 

kylin.source.default=8

kylin.source.jdbc.connection-url=jdbc:postgresql://dwh-db-prod:5432/dwh

kylin.source.jdbc.driver=org.postgresql.Driver

kylin.source.jdbc.dialect=default

kylin.source.jdbc.user=mylogin

kylin.source.jdbc.pass=mypassword

kylin.source.jdbc.sqoop-home=/usr/hdp/current/sqoop-client

kylin.source.default=8

kylin.source.jdbc.filed-delimiter=|

 

and then I restarted Kylin:

$KYLIN_HOME/bin/kylin.sh restart

 

I still don't have the PostgreSQL table, but only the demo of the SSB table
from Hive.

 

Could anyone advise me what is the right procedure?

thank you very much

Josef Szylar

 



kylin4.0.1邮件发送报错

2024-04-15 Thread suoli
kylin4.0.1配置完邮件告警,发送邮件时,总是报错,以下是异常日志。请问你有遇到过吗?我看了邮件模版,大概是因为拿不到参数${env_name}。
2024-04-11 13:46:35,024 INFO  [http-bio-7070-exec-3] cube.PathManager:66 : 
Deleting segment parquet path 
hdfs://nameservice1/kylin4.0.0/kylin_metadata/wnep/parquet/wnep_benefits_qyb_fit_order_new_dwd_model_cube/2024040300_2024040400_Y4Q
2024-04-11 13:46:35,241 ERROR [Scheduler 512464937 Job 
fd0b6de7-d474-47a4-9e65-e943366b7805-56] freemarker.runtime:60 : Error 
executing FreeMarker template
FreeMarker template error:
The following has evaluated to null or missing:
==> env_name  [in template "JOB_DISCARD.ftl" at line 101, column 19]



Tip: If the failing expression is known to be legally refer to something that's 
sometimes null or missing, either specify a default value like 
myOptionalVar!myDefault, or use <#if 
myOptionalVar??>when-present<#else>when-missing. (These only cover the 
last step of the expression; to cover the whole expression, use parenthesis: 
(myOptionalVar.foo)!myDefault, (myOptionalVar.foo)??




FTL stack trace ("~" means nesting-related):
- Failed at: ${env_name}  [in template "JOB_DISCARD.ftl" at line 101, column 17]



Java stack trace (for programmers):

freemarker.core.InvalidReferenceException: [... Exception message was already 
printed; see it above ...]
at 
freemarker.core.InvalidReferenceException.getInstance(InvalidReferenceException.java:131)
at freemarker.core.EvalUtil.coerceModelToString(EvalUtil.java:355)
at freemarker.core.Expression.evalAndCoerceToString(Expression.java:82)
at freemarker.core.DollarVariable.accept(DollarVariable.java:41)
at freemarker.core.Environment.visit(Environment.java:324)
at freemarker.core.MixedContent.accept(MixedContent.java:54)
at freemarker.core.Environment.visit(Environment.java:324)
at freemarker.core.Environment.process(Environment.java:302)
at freemarker.template.Template.process(Template.java:325)
at 
org.apache.kylin.common.util.MailTemplateProvider.buildMailContent(MailTemplateProvider.java:63)
at 
org.apache.kylin.job.util.MailNotificationUtil.getMailContent(MailNotificationUtil.java:70)
at org.apache.kylin.engine.mr.CubingJob.formatNotifications(CubingJob.java:251)
at 
org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:368)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onStatusChange(DefaultChainedExecutable.java:179)
at org.apache.kylin.engine.mr.CubingJob.onStatusChange(CubingJob.java:280)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:125)
at org.apache.kylin.engine.mr.CubingJob.onExecuteFinished(CubingJob.java:276)
at 
org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:138)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:228)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2024-04-11 13:46:35,254 ERROR [Scheduler 512464937 Job 
fd0b6de7-d474-47a4-9e65-e943366b7805-56] execution.AbstractExecutable:371 : 
error send email
java.lang.NullPointerException
at 
org.apache.kylin.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:878)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.toString(Joiner.java:452)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.appendTo(Joiner.java:109)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.appendTo(Joiner.java:152)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:195)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:185)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:203)
at 
org.apache.kylin.job.util.MailNotificationUtil.getMailTitle(MailNotificationUtil.java:79)
at org.apache.kylin.engine.mr.CubingJob.formatNotifications(CubingJob.java:252)
at 
org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:368)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onStatusChange(DefaultChainedExecutable.java:179)
at org.apache.kylin.engine.mr.CubingJob.onStatusChange(CubingJob.java:280)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:125)
at org.apache.kylin.engine.mr.CubingJob.onExecuteFinished(CubingJob.java:276)
at 
org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:138)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:228)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

kylin4.0.1邮件告警发生异常

2024-04-11 Thread suoli
你好,kylin4.0.1任务构建完成以后,告警发送邮件发生异常,看日志原因是构建邮件模版信息时报错了,获取不到env_name,但是kylin.properties中的kylin.env=QA,请问如何解决?以下是报错日志:
2024-04-11 18:03:38,738 ERROR [Scheduler 1170635011 Job 
43525ef8-d8ee-4b17-a326-a7e97fcad95f-86] freemarker.runtime:60 : Error 
executing FreeMarker template
FreeMarker template error:
The following has evaluated to null or missing:
==> env_name  [in template "JOB_SUCCEED.ftl" at line 100, column 19]



Tip: If the failing expression is known to be legally refer to something that's 
sometimes null or missing, either specify a default value like 
myOptionalVar!myDefault, or use <#if 
myOptionalVar??>when-present<#else>when-missing. (These only cover the 
last step of the expression; to cover the whole expression, use parenthesis: 
(myOptionalVar.foo)!myDefault, (myOptionalVar.foo)??




FTL stack trace ("~" means nesting-related):
- Failed at: ${env_name}  [in template "JOB_SUCCEED.ftl" at line 100, column 17]



Java stack trace (for programmers):

freemarker.core.InvalidReferenceException: [... Exception message was already 
printed; see it above ...]
at 
freemarker.core.InvalidReferenceException.getInstance(InvalidReferenceException.java:131)
at freemarker.core.EvalUtil.coerceModelToString(EvalUtil.java:355)
at freemarker.core.Expression.evalAndCoerceToString(Expression.java:82)
at freemarker.core.DollarVariable.accept(DollarVariable.java:41)
at freemarker.core.Environment.visit(Environment.java:324)
at freemarker.core.MixedContent.accept(MixedContent.java:54)
at freemarker.core.Environment.visit(Environment.java:324)
at freemarker.core.Environment.process(Environment.java:302)
at freemarker.template.Template.process(Template.java:325)
at 
org.apache.kylin.common.util.MailTemplateProvider.buildMailContent(MailTemplateProvider.java:63)
at 
org.apache.kylin.job.util.MailNotificationUtil.getMailContent(MailNotificationUtil.java:70)
at org.apache.kylin.engine.mr.CubingJob.formatNotifications(CubingJob.java:251)
at 
org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:368)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onStatusChange(DefaultChainedExecutable.java:179)
at org.apache.kylin.engine.mr.CubingJob.onStatusChange(CubingJob.java:280)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:160)
at org.apache.kylin.engine.mr.CubingJob.onExecuteFinished(CubingJob.java:276)
at 
org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:138)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:228)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2024-04-11 18:03:38,773 ERROR [Scheduler 1170635011 Job 
43525ef8-d8ee-4b17-a326-a7e97fcad95f-86] execution.AbstractExecutable:371 : 
error send email
java.lang.NullPointerException
at 
org.apache.kylin.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:878)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.toString(Joiner.java:452)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.appendTo(Joiner.java:109)
at 
org.apache.kylin.shaded.com.google.common.base.Joiner.appendTo(Joiner.java:152)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:195)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:185)
at org.apache.kylin.shaded.com.google.common.base.Joiner.join(Joiner.java:203)
at 
org.apache.kylin.job.util.MailNotificationUtil.getMailTitle(MailNotificationUtil.java:79)
at org.apache.kylin.engine.mr.CubingJob.formatNotifications(CubingJob.java:252)
at 
org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:368)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onStatusChange(DefaultChainedExecutable.java:179)
at org.apache.kylin.engine.mr.CubingJob.onStatusChange(CubingJob.java:280)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:160)
at org.apache.kylin.engine.mr.CubingJob.onExecuteFinished(CubingJob.java:276)
at 
org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:138)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:228)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)



Re:来自suoli的邮件

2024-04-10 Thread suoli


















在 2024-04-11 13:50:20,"suoli"  写道:




来自suoli的邮件

2024-04-10 Thread suoli



来自suoli的邮件

2024-04-10 Thread suoli



Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone,

As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.

We invite all projects and contributors to participate in the following
ways:

* Individuals - submit your first contribution:
https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
* Projects - share your public good story:
https://docs.google.com/forms/d/1vuN-tUnBwpTgOE5xj3Z5AG1hsOoDNLBmGIqQHwQT6k8/viewform?edit_requested=true
* Projects - submit a project spotlight for the blog:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278466116
* Projects - contact the Voice of Apache podcast (formerly Feathercast) to
be featured: https://feathercast.apache.org/help/
*  Projects - use the 25th anniversary template and the #ASF25Years hashtag
on social media:
https://docs.google.com/presentation/d/1oDbMol3F_XQuCmttPYxBIOIjRuRBksUjDApjd8Ve3L8/edit#slide=id.g26b0919956e_0_13

If you have questions, email the Marketing & Publicity team at
mark...@apache.org.

Peace,
BKP

[1] https://apache.org/asf25years/

[NOTE: You are receiving this message because you are a contributor to an
Apache Software Foundation project. The ASF will very occasionally send out
messages relating to the Foundation to contributors and members, such as
this one.]

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences


Re: How to query the Cube via API and use the dataset for other purpose

2024-04-03 Thread Nam Đỗ Duy via user
Thank you very much for your response, I did ask a pro for help and below
was the sample code on sample SSB project I would like to contribute to
help someone who have same issue like me:

==


import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
import org.json4s.jackson.JsonMethods
import org.json4s.{DefaultFormats, Formats}

import java.io.{BufferedReader, DataOutputStream, InputStreamReader}
import java.net.{HttpURLConnection, URL}
import java.util.Base64

object APIKylinRunSQL {

  val KYLIN_QUERY_URL = "http://localhost:7070/kylin/api/query;
  val USER_NAME = "x"
  val PASSWORD = "y"
  val KYLIN_PROJECT = "learn_kylin"

  val spark = SparkSession.builder
.master("local")
.appName("Convert JSON to DataFrame")
.getOrCreate()

  def main(args: Array[String]): Unit = {


val tablesAndQueries = Map(
  "CUSTOMER" -> "select * from SSB.CUSTOMER",
  "DATES" -> "SELECT * FROM SSB.DATES",
  "PART" -> "SELECT * FROM SSB.PART",
  "P_LINEORDER" -> "SELECT * FROM SSB.P_LINEORDER",
  "SUPPLIER" -> "SELECT * FROM SSB.SUPPLIER",
  "P_LINEORDER" -> "SELECT lo_orderdate, count(1) FROM SSB.P_LINEORDER
GROUP BY lo_orderdate",
  "PART" -> "SELECT P_COLOR, count(1) FROM SSB.PART group by P_COLOR"
)

// query times
val numberOfExecutions = 15

// loop query
for (i <- 1 to numberOfExecutions) {
  println(s"Executing query $i")
  for ((table, query) <- tablesAndQueries) {
println(s"Executing queries for table $table")

println(query)

executeQuery(query)
// wait a seconds
Thread.sleep(1000)
  }
}

  }

  def executeQuery(sqlQuery: String): Unit = {

val queryJson =
  s"""
 |{
 |  "project": "$KYLIN_PROJECT",
 |  "sql": "$sqlQuery"
 |}
 |""".stripMargin

// Encode the username and password for basic authentication
val encodedAuth =
Base64.getEncoder.encodeToString(s"$USER_NAME:$PASSWORD".getBytes)

val url = new URL(KYLIN_QUERY_URL)
val connection = url.openConnection.asInstanceOf[HttpURLConnection]

connection.setRequestMethod("POST")
connection.setRequestProperty("Authorization", s"Basic $encodedAuth")
connection.setRequestProperty("Content-Type", "application/json")
connection.setRequestProperty("Accept", "application/json")
connection.setDoOutput(true)

val outputStream = connection.getOutputStream
val writer = new DataOutputStream(outputStream)
writer.write(queryJson.getBytes("UTF-8"))
writer.flush()
writer.close()

val responseCode = connection.getResponseCode

if (responseCode == HttpURLConnection.HTTP_OK) {
  val inputStream = connection.getInputStream
  val reader = new BufferedReader(new InputStreamReader(inputStream))
  var inputLine: String = null
  val response = new StringBuilder

  while ( {
inputLine = reader.readLine;
inputLine != null
  }) {
response.append(inputLine)
  }
  reader.close()
  println("Result:")
  println(response.toString)

  connection.disconnect()

  // parse JSON
  implicit val formats: Formats = DefaultFormats
  val parsedJson = JsonMethods.parse(response.toString)

  val columns = (parsedJson \ "columnMetas")
.extract[List[Map[String, Any]]]

  // dynamically build the schema based on column name information in
JSON
  val schema = StructType(columns.map { col =>
val columnName = col("name").asInstanceOf[String]
StructField(columnName, StringType, nullable = true)
  })

  schema.printTreeString()

  // get data from JSON
  val data = (parsedJson \ "results").extract[List[List[Any]]]

  // convert data to RDD[Row]
  val rowsRDD = spark.sparkContext.parallelize(data.map(row =>
Row.fromSeq(row.map(_.asInstanceOf[AnyRef]

  val df = spark.createDataFrame(rowsRDD, schema)

  df.show(20, false)

} else {
  println(s"Error: $responseCode")
  connection.disconnect()
}
  }
}


On Sun, Mar 31, 2024 at 8:57 PM Lionel CL  wrote:

> Hi Nam,
> You can refer to the spark docs
> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>
> Regards,
> Lu Cao
>
> From: Nam Đỗ Duy 
> Date: Sunday, March 31, 2024 at 08:53
> To: dev , user@kylin.apache.org <
> user@kylin.apache.org>
> Subject: Re: How to query the Cube via API and use the dataset for other
> purpose
> Dear Sirs/Madames
>
> Could anyone here help me to figureout the way to use scala to query an
> select SQL against kylin cube via API then turn that table result into a
> dataframe in scala for other purpose?
>
> Thank you so much for your time!
>
> Best regards
>
> On Fri, 29 Mar 2024 at 17:52 Nam Đỗ Duy  wrote:
>
> > Hi Xiaoxiang,
> > Sir & Madames,
> >
> > I use the following code to query the cube via API but I cannot use the
> > result as a 

Re: How to query the Cube via API and use the dataset for other purpose

2024-03-30 Thread Nam Đỗ Duy via user
Dear Sirs/Madames

Could anyone here help me to figureout the way to use scala to query an
select SQL against kylin cube via API then turn that table result into a
dataframe in scala for other purpose?

Thank you so much for your time!

Best regards

On Fri, 29 Mar 2024 at 17:52 Nam Đỗ Duy  wrote:

> Hi Xiaoxiang,
> Sir & Madames,
>
> I use the following code to query the cube via API but I cannot use the
> result as a dataframe, could you suggest a way to do that because it is
> very important for our project.
>
> Thanks and best regards
>
> ===
>
> import org.apache.spark.sql.{DataFrame, SparkSession}
> import org.apache.spark.sql.functions._
>
> object APICaller {
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder()
>   .appName("APICaller")
>   .master("local[*]")
>   .getOrCreate()
>
> import spark.implicits._
>
> val username = "namdd"
> val password = "eer123"
> val urlString = "http://localhost:7070/kylin/api/query;
> val project = "learn_kylin"
> val query = "select count(*) from HIVE_DWH_STANDARD.factuserEvent"
>
> val response: String = callAPI(urlString, username, password, project,
> query)
>
> // Convert response to DataFrame
> val df = spark.read.json(Seq(response).toDS())
>
> // Show DataFrame
> df.show()
>
> // Stop Spark session
> spark.stop()
>   }
>
>   def callAPI(url: String, username: String, password: String, project:
> String, query: String): String = {
> val encodedAuth =
> java.util.Base64.getEncoder.encodeToString(s"$username:$password".getBytes)
>
> val connection = scalaj.http.Http(url)
>   .postData(s"""{"project": "$project", "sql": "$query"}""")
>   .header("Content-Type", "application/json")
>   .header("Accept", "application/json")
>   .auth(username, password)
>   .asString
>
> if (connection.isError)
>   throw new RuntimeException(s"Error calling API: ${connection.body}")
>
> connection.body
>   }
> }
>
>


How to query the Cube via API and use the dataset for other purpose

2024-03-29 Thread Nam Đỗ Duy via user
Hi Xiaoxiang,
Sir & Madames,

I use the following code to query the cube via API but I cannot use the
result as a dataframe, could you suggest a way to do that because it is
very important for our project.

Thanks and best regards

===

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object APICaller {
  def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
  .appName("APICaller")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val username = "namdd"
val password = "eer123"
val urlString = "http://localhost:7070/kylin/api/query;
val project = "learn_kylin"
val query = "select count(*) from HIVE_DWH_STANDARD.factuserEvent"

val response: String = callAPI(urlString, username, password, project,
query)

// Convert response to DataFrame
val df = spark.read.json(Seq(response).toDS())

// Show DataFrame
df.show()

// Stop Spark session
spark.stop()
  }

  def callAPI(url: String, username: String, password: String, project:
String, query: String): String = {
val encodedAuth =
java.util.Base64.getEncoder.encodeToString(s"$username:$password".getBytes)

val connection = scalaj.http.Http(url)
  .postData(s"""{"project": "$project", "sql": "$query"}""")
  .header("Content-Type", "application/json")
  .header("Accept", "application/json")
  .auth(username, password)
  .asString

if (connection.isError)
  throw new RuntimeException(s"Error calling API: ${connection.body}")

connection.body
  }
}


Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: Pinot/Kylin/Druid quick comparision

2024-03-17 Thread Nam Đỗ Duy via user
Thank you Li Yang, I think the development of version 5 would be hard
work for you but the impact is big so please keep me posted!

All the best

On Thu, Mar 14, 2024 at 10:51 AM Li Yang  wrote:

> Nam,
>
> We are planning to release a kylin5-beta around March or April. The GA of
> kylin5 would be around July this year if everything goes well.
>
> Cheers
> Yang
>
> On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy  wrote:
>
>> Hello Xiaoxiang,
>>
>> How are you, my boss is very interested in Kylin 5. so he would like to
>> know when Kylin 5 will be released...could you please provide an
>> estimation?
>>
>> Thank you very much and best regards
>>
>>
>>
>>
>>
>> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:
>>
>> > Good morning Xiaoxiang, hope you are well
>> >
>> > 1. JDBC source is a feature which in development, it will be supported
>> > later.
>> >
>> > ===
>> >
>> > May I know when will the JDBC be available? as well as is there any
>> change
>> > in Kylin 5 release date
>> >
>> > Thank you and best regards
>> >
>> >
>> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
>> >
>> >> 1. JDBC source is a feature which in development, it will be supported
>> >> later.
>> >>
>> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> >> (I will let you know.)
>> >>
>> >> 3. I think ranger and Kerberos are not doing the same things, one for
>> >> authentication, one for authorization. So they cannot replace each
>> other.
>> >> Ranger can integrate with Kerberos, please check ranger's website for
>> >> information.
>> >>
>> >> 
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy 
>> wrote:
>> >>
>> >> > Thank you Xiaoxiang for your reply
>> >> >
>> >> > -
>> >> > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > -
>> >> > Yes: please answer to help me clear this headache:
>> >> >
>> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse
>> ?
>> >> If
>> >> > not then do we have any work around?
>> >> >
>> >> > 2. My team is using kerberos for authentication, do you have any
>> >> > document/casestudy about integrating kerberos with kylin 4.x and
>> kylin
>> >> 5.x
>> >> >
>> >> > 3. Should we use apache ranger instead of kerberos for authentication
>> >> and
>> >> > for security purposes?
>> >> >
>> >> > Thank you again
>> >> >
>> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>> >> >
>> >> > > I guess the release date should be 2024/01 .
>> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > >
>> >> > > 
>> >> > > With warm regard
>> >> > > Xiaoxiang Yu
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
>> >> > wrote:
>> >> > >
>> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> >> > already
>> >> > >> so there is no time for you to comment. Next time I will send you
>> in
>> >> > >> advance. The meeting result was that we will implement both druid
>> and
>> >> > >> kylin
>> >> > >> in the next couple of projects because of its realtime feature.
>> Hope
>> >> > that
>> >> > >> kylin will have same feature soon.
>> >> > >>
>> >> > >> May I ask when will you release kylin 5.0?
>> >> > >>
>> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu 
>> wrote:
>> >> > >>
>> >> > >> > Since 2018 there are a lot of new features and code refactor.
>> >> > >> > If you like, you can share your ppt to me privately, maybe I can
>> >> > >> > give some comments.
>> >> > >> >
>> >> > >> > Here is the reference of advantages of Kylin since 2018:
>> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> >> > >> > -
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >> > >> >
>> >> > >> > 
>> >> > >> > With warm regard
>> >> > >> > Xiaoxiang Yu
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy
>> 
>> >> > >> wrote:
>> >> > >> >
>> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
>> and
>> >> > >> Druid in
>> >> > >> >> my team.
>> >> > >> >>
>> >> > >> >> I found this article and would like you to update me the
>> >> advantages
>> >> > of
>> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> >> released)
>> >> > >> >>
>> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1
>> of
>> >> 2)?
>> >> > >> >> <
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy 
>> wrote:
>> >> > >> >>
>> >> > >> >> > Thank you very much for your prompt 

Re: Pinot/Kylin/Druid quick comparision

2024-03-13 Thread Li Yang
Nam,

We are planning to release a kylin5-beta around March or April. The GA of
kylin5 would be around July this year if everything goes well.

Cheers
Yang

On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy  wrote:

> Hello Xiaoxiang,
>
> How are you, my boss is very interested in Kylin 5. so he would like to
> know when Kylin 5 will be released...could you please provide an
> estimation?
>
> Thank you very much and best regards
>
>
>
>
>
> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:
>
> > Good morning Xiaoxiang, hope you are well
> >
> > 1. JDBC source is a feature which in development, it will be supported
> > later.
> >
> > ===
> >
> > May I know when will the JDBC be available? as well as is there any
> change
> > in Kylin 5 release date
> >
> > Thank you and best regards
> >
> >
> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
> >
> >> 1. JDBC source is a feature which in development, it will be supported
> >> later.
> >>
> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> >> (I will let you know.)
> >>
> >> 3. I think ranger and Kerberos are not doing the same things, one for
> >> authentication, one for authorization. So they cannot replace each
> other.
> >> Ranger can integrate with Kerberos, please check ranger's website for
> >> information.
> >>
> >> 
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy 
> wrote:
> >>
> >> > Thank you Xiaoxiang for your reply
> >> >
> >> > -
> >> > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > -
> >> > Yes: please answer to help me clear this headache:
> >> >
> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
> >> If
> >> > not then do we have any work around?
> >> >
> >> > 2. My team is using kerberos for authentication, do you have any
> >> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> >> 5.x
> >> >
> >> > 3. Should we use apache ranger instead of kerberos for authentication
> >> and
> >> > for security purposes?
> >> >
> >> > Thank you again
> >> >
> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >> >
> >> > > I guess the release date should be 2024/01 .
> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > >
> >> > > 
> >> > > With warm regard
> >> > > Xiaoxiang Yu
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> >> > wrote:
> >> > >
> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
> >> > already
> >> > >> so there is no time for you to comment. Next time I will send you
> in
> >> > >> advance. The meeting result was that we will implement both druid
> and
> >> > >> kylin
> >> > >> in the next couple of projects because of its realtime feature.
> Hope
> >> > that
> >> > >> kylin will have same feature soon.
> >> > >>
> >> > >> May I ask when will you release kylin 5.0?
> >> > >>
> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu 
> wrote:
> >> > >>
> >> > >> > Since 2018 there are a lot of new features and code refactor.
> >> > >> > If you like, you can share your ppt to me privately, maybe I can
> >> > >> > give some comments.
> >> > >> >
> >> > >> > Here is the reference of advantages of Kylin since 2018:
> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > >> > -
> >> > >> >
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> > >> >
> >> > >> > 
> >> > >> > With warm regard
> >> > >> > Xiaoxiang Yu
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  >
> >> > >> wrote:
> >> > >> >
> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
> and
> >> > >> Druid in
> >> > >> >> my team.
> >> > >> >>
> >> > >> >> I found this article and would like you to update me the
> >> advantages
> >> > of
> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
> >> released)
> >> > >> >>
> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> >> 2)?
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> > >> >> >
> >> > >> >>
> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy 
> wrote:
> >> > >> >>
> >> > >> >> > Thank you very much for your prompt response, I still have
> >> several
> >> > >> >> > questions to seek for your help later.
> >> > >> >> >
> >> > >> >> > Best regards and have a good day
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Done. Github branch changed to kylin5.
> >> > >> >> >>
> >> > >> >> >> 

Re: Pinot/Kylin/Druid quick comparision

2024-03-05 Thread Nam Đỗ Duy via user
Hello Xiaoxiang,

How are you, my boss is very interested in Kylin 5. so he would like to
know when Kylin 5 will be released...could you please provide an estimation?

Thank you very much and best regards





On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:

> Good morning Xiaoxiang, hope you are well
>
> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> ===
>
> May I know when will the JDBC be available? as well as is there any change
> in Kylin 5 release date
>
> Thank you and best regards
>
>
> On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
>
>> 1. JDBC source is a feature which in development, it will be supported
>> later.
>>
>> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> (I will let you know.)
>>
>> 3. I think ranger and Kerberos are not doing the same things, one for
>> authentication, one for authorization. So they cannot replace each other.
>> Ranger can integrate with Kerberos, please check ranger's website for
>> information.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>>
>> > Thank you Xiaoxiang for your reply
>> >
>> > -
>> > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > -
>> > Yes: please answer to help me clear this headache:
>> >
>> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
>> If
>> > not then do we have any work around?
>> >
>> > 2. My team is using kerberos for authentication, do you have any
>> > document/casestudy about integrating kerberos with kylin 4.x and kylin
>> 5.x
>> >
>> > 3. Should we use apache ranger instead of kerberos for authentication
>> and
>> > for security purposes?
>> >
>> > Thank you again
>> >
>> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>> >
>> > > I guess the release date should be 2024/01 .
>> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > >
>> > > 
>> > > With warm regard
>> > > Xiaoxiang Yu
>> > >
>> > >
>> > >
>> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
>> > wrote:
>> > >
>> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> > already
>> > >> so there is no time for you to comment. Next time I will send you in
>> > >> advance. The meeting result was that we will implement both druid and
>> > >> kylin
>> > >> in the next couple of projects because of its realtime feature. Hope
>> > that
>> > >> kylin will have same feature soon.
>> > >>
>> > >> May I ask when will you release kylin 5.0?
>> > >>
>> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>> > >>
>> > >> > Since 2018 there are a lot of new features and code refactor.
>> > >> > If you like, you can share your ppt to me privately, maybe I can
>> > >> > give some comments.
>> > >> >
>> > >> > Here is the reference of advantages of Kylin since 2018:
>> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > >> > -
>> > >> >
>> > >>
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> > >> >
>> > >> > 
>> > >> > With warm regard
>> > >> > Xiaoxiang Yu
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
>> > >> wrote:
>> > >> >
>> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> > >> Druid in
>> > >> >> my team.
>> > >> >>
>> > >> >> I found this article and would like you to update me the
>> advantages
>> > of
>> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> released)
>> > >> >>
>> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
>> 2)?
>> > >> >> <
>> > >> >>
>> > >>
>> >
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> > >> >> >
>> > >> >>
>> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>> > >> >>
>> > >> >> > Thank you very much for your prompt response, I still have
>> several
>> > >> >> > questions to seek for your help later.
>> > >> >> >
>> > >> >> > Best regards and have a good day
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
>> > wrote:
>> > >> >> >
>> > >> >> >> Done. Github branch changed to kylin5.
>> > >> >> >>
>> > >> >> >> 
>> > >> >> >> With warm regard
>> > >> >> >> Xiaoxiang Yu
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
>> > >> wrote:
>> > >> >> >>
>> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > >> >> >> > 
>> > >> >> >> > With warm regard
>> > >> >> >> > Xiaoxiang Yu
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> > On Tue, Dec 5, 2023 at 

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: kylin4_on_cloud deployment errors

2024-02-07 Thread John W
I followed the troubleshooting instructions at:
https://github.com/apache/kylin/blob/kylin4_on_cloud/readme/trouble_shooting.md#kylin-can-not-access-and-exception-session-0x0-for-server-null-unexpected-error-closing-socket-connection-and-attempting-reconnect-is-in-kylinlog

When logging into the zookeeper instances, the .bash_profile file looks
standard and there is no reference to $ZOOKEEPER_HOME

[root@ip-172-27-32-153 ec2-user]# cat .bash_profile
--
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin

export PATH
---

So the deployment did not configure the zookeeper instances at all. Would
anyone know how to fix this?

Any help appreciated.

On Wed, 7 Feb 2024 at 04:33, John W  wrote:

> Hi, I'm having problems deploying the kylin4_on_cloud project located at:
> https://github.com/apache/kylin/tree/kylin4_on_cloud
>
> I've also been following the instructions here
> https://www.youtube.com/watch?v=5kKXEMjO1Sc_channel=Kyligence
>
> I used windows to git clone the repo and set up the venv with the latest
> packages via:
> pip install PyYAML
> pip install boto3
> pip install botocore
> pip install pyparsing
> pip install requests
> pip install retrying
> pip install Jinja2
> pip install pytest-shutil
> :
> I also changed the RDSEngineVersion to 8.0.35 in kylin_configs.yaml, as
> RDSEngineVersion 5.7.25 (default repo version) was giving me the error
> "Exception: Current stack: ec2-rds-stack is create failed, please check".
>
> Here's the log with error I am now getting:
>
> ==
> (venv) C:\projects\kylin4_on_cloud>python deploy.py --type deploy --mode
> job
> 2024-02-07 02:13:54 - botocore.credentials - INFO - 5484 - Found
> credentials in shared credentials file: ~/.aws/credentials
> 2024-02-07 02:13:57 - engine - INFO - 5484 - Env already inited, skip init
> again.
> 2024-02-07 02:13:58 - clouds.aws - WARNING - 5484 - Current env for
> deploying a cluster is not ready.
> 2024-02-07 02:14:20 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-or-emr-vpc-stack.
> 2024-02-07 02:16:42 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-rds-stack.
> 2024-02-07 02:21:06 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-static-service-stack.
> 2024-02-07 02:21:06 - engine - INFO - 5484 - First launch default Kylin
> Cluster.
> 2024-02-07 02:22:08 - clouds.aws - WARNING - 5484 - Current cluster is not
> ready.
> 2024-02-07 02:22:30 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-zookeeper-stack.
> 2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
> execute commands in `Zookeeper stack` which named ec2-zookeeper-stack.
> 2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0cbc37f83c9cda006 is executing commands: grep -Fq
> "10.1.0.133:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:23:49 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0915d44c700e644dc is executing commands: grep -Fq
> "10.1.0.129:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:23:54 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0fdbacc22ecae360a is executing commands: grep -Fq
> "10.1.0.58:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:24:00 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0cbc37f83c9cda006 is executing commands: echo
> 'server.1=10.1.0.133:2888:3888
> server.2=10.1.0.129:2888:3888
> server.3=10.1.0.58:2888:3888' >>
> /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg.
> 2024-02-07 02:24:05 - instances.aws_instance - WARNING - 5484 -
> {'CommandId': '704b776f-e574-47ea-bf13-30d3be2e9df2', 'InstanceId':
> 'i-0cbc37f83c9cda006', 'Comment': '', 'DocumentName': 'AWS-RunShellScript',
> 'DocumentVersion': '$DEFAULT', 'PluginName': 'aws:runShellScript',
> 'ResponseCode': 1,
> 'ExecutionStartDateTime': '2024-02-06T16:24:00.394Z',
> 'ExecutionElapsedTime': 'PT0.008S', 'ExecutionEndDateTime':
> '2024-02-06T16:24:00.394Z', 'Status': 'Failed', 'StatusDetails': 'Failed',
> 'StandardOutputContent': '', 'StandardOutputUrl': '',
> 'StandardErrorContent':
> '/var/lib/amazon/ssm/i-0cbc37f83c9cda006/document/orchestration/704b776f-e574-47ea-bf13-30d3be2e9df2/awsrunShellScript/0.awsrunShellScript/_script.sh:
> line 3: /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg: No such file or
> directory\nfailed to run commands: exit status 1', 'StandardErrorUrl': '',
> 'CloudWatchOutputConfig': {'CloudWatchLogGroupName': '',
> 'CloudWatchOutputEnabled': False}, 'ResponseMetadata': {'RequestId':
> '133ea7d8-d661-4ea0-960d-349b294dd8a9', 'HTTPStatusCode': 200,
> 'HTTPHeaders': {'server': 'Server', 'date': 'Tue, 06 Feb 2024 16:24:05

kylin4_on_cloud deployment errors

2024-02-06 Thread John W
Hi, I'm having problems deploying the kylin4_on_cloud project located at:
https://github.com/apache/kylin/tree/kylin4_on_cloud

I've also been following the instructions here
https://www.youtube.com/watch?v=5kKXEMjO1Sc_channel=Kyligence

I used windows to git clone the repo and set up the venv with the latest
packages via:
pip install PyYAML
pip install boto3
pip install botocore
pip install pyparsing
pip install requests
pip install retrying
pip install Jinja2
pip install pytest-shutil
:
I also changed the RDSEngineVersion to 8.0.35 in kylin_configs.yaml, as
RDSEngineVersion 5.7.25 (default repo version) was giving me the error
"Exception: Current stack: ec2-rds-stack is create failed, please check".

Here's the log with error I am now getting:

==
(venv) C:\projects\kylin4_on_cloud>python deploy.py --type deploy --mode job
2024-02-07 02:13:54 - botocore.credentials - INFO - 5484 - Found
credentials in shared credentials file: ~/.aws/credentials
2024-02-07 02:13:57 - engine - INFO - 5484 - Env already inited, skip init
again.
2024-02-07 02:13:58 - clouds.aws - WARNING - 5484 - Current env for
deploying a cluster is not ready.
2024-02-07 02:14:20 - instances.aws_instance - INFO - 5484 - Now creating
stack: ec2-or-emr-vpc-stack.
2024-02-07 02:16:42 - instances.aws_instance - INFO - 5484 - Now creating
stack: ec2-rds-stack.
2024-02-07 02:21:06 - instances.aws_instance - INFO - 5484 - Now creating
stack: ec2-static-service-stack.
2024-02-07 02:21:06 - engine - INFO - 5484 - First launch default Kylin
Cluster.
2024-02-07 02:22:08 - clouds.aws - WARNING - 5484 - Current cluster is not
ready.
2024-02-07 02:22:30 - instances.aws_instance - INFO - 5484 - Now creating
stack: ec2-zookeeper-stack.
2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
execute commands in `Zookeeper stack` which named ec2-zookeeper-stack.
2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
instance id: i-0cbc37f83c9cda006 is executing commands: grep -Fq
"10.1.0.133:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
$?.
2024-02-07 02:23:49 - instances.aws_instance - INFO - 5484 - Current
instance id: i-0915d44c700e644dc is executing commands: grep -Fq
"10.1.0.129:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
$?.
2024-02-07 02:23:54 - instances.aws_instance - INFO - 5484 - Current
instance id: i-0fdbacc22ecae360a is executing commands: grep -Fq
"10.1.0.58:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo $?.
2024-02-07 02:24:00 - instances.aws_instance - INFO - 5484 - Current
instance id: i-0cbc37f83c9cda006 is executing commands: echo
'server.1=10.1.0.133:2888:3888
server.2=10.1.0.129:2888:3888
server.3=10.1.0.58:2888:3888' >>
/home/ec2-user/hadoop/zookeeper/conf/zoo.cfg.
2024-02-07 02:24:05 - instances.aws_instance - WARNING - 5484 -
{'CommandId': '704b776f-e574-47ea-bf13-30d3be2e9df2', 'InstanceId':
'i-0cbc37f83c9cda006', 'Comment': '', 'DocumentName': 'AWS-RunShellScript',
'DocumentVersion': '$DEFAULT', 'PluginName': 'aws:runShellScript',
'ResponseCode': 1,
'ExecutionStartDateTime': '2024-02-06T16:24:00.394Z',
'ExecutionElapsedTime': 'PT0.008S', 'ExecutionEndDateTime':
'2024-02-06T16:24:00.394Z', 'Status': 'Failed', 'StatusDetails': 'Failed',
'StandardOutputContent': '', 'StandardOutputUrl': '',
'StandardErrorContent':
'/var/lib/amazon/ssm/i-0cbc37f83c9cda006/document/orchestration/704b776f-e574-47ea-bf13-30d3be2e9df2/awsrunShellScript/0.awsrunShellScript/_script.sh:
line 3: /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg: No such file or
directory\nfailed to run commands: exit status 1', 'StandardErrorUrl': '',
'CloudWatchOutputConfig': {'CloudWatchLogGroupName': '',
'CloudWatchOutputEnabled': False}, 'ResponseMetadata': {'RequestId':
'133ea7d8-d661-4ea0-960d-349b294dd8a9', 'HTTPStatusCode': 200,
'HTTPHeaders': {'server': 'Server', 'date': 'Tue, 06 Feb 2024 16:24:05
GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length':
'848', 'connection': 'keep-alive', 'x-amzn-requestid':
'133ea7d8-d661-4ea0-960d-349b294dd8a9'}, 'RetryAttempts': 0}}
Traceback (most recent call last):
  File "C:\myfiles\_clients\me\kylin\kylin4_on_cloud\deploy.py", line 141,
in 
deploy_on_aws(args.type, args.kylin_mode, args.scale_type,
args.node_type, args.cluster)
  File "C:\myfiles\_clients\me\kylin\kylin4_on_cloud\deploy.py", line 63,
in deploy_on_aws
aws_engine.launch_default_cluster()
  File "C:\myfiles\_clients\me\kylin\kylin4_on_cloud\engine.py", line 38,
in launch_default_cluster
self.engine_utils.launch_default_cluster()
  File
"C:\myfiles\_clients\me\kylin\kylin4_on_cloud\utils\engine_utils.py", line
101, in launch_default_cluster
cloud_addr = self.get_kylin_address()
  File
"C:\myfiles\_clients\me\kylin\kylin4_on_cloud\utils\engine_utils.py", line
217, in get_kylin_address
kylin_address = self.aws.get_kylin_address()
  File 

Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


CVE-2023-29055: Apache Kylin: Insufficiently protected credentials in config file

2024-01-29 Thread Li Yang
Severity: low

Affected versions:

- Apache Kylin 2.0.0 through 4.0.3

Description:

In Apache Kylin version 2.0.0 to 4.0.3, there is a Server Config web interface 
that displays the content of file 'kylin.properties', that may contain 
serverside credentials. When the kylin service runs over HTTP (or other plain 
text protocol), it is possible for network sniffers to hijack the HTTP payload 
and get access to the content of kylin.properties and potentially the 
containing credentials.

To avoid this threat, users are recommended to 

  *  Always turn on HTTPS so that network payload is encrypted.

  *  Avoid putting credentials in kylin.properties, or at least not in plain 
text.
  *  Use network firewalls to protect the serverside such that it is not 
accessible to external attackers.

  *  Upgrade to version Apache Kylin 4.0.4, which filters out the sensitive 
content that goes to the Server Config web interface.

Credit:

Li Jiakun <2839549...@qq.com> (reporter)

References:

https://kylin.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-29055



[Announce] Apache Kylin 4.0.4 released

2024-01-28 Thread Li Yang
The Apache Kylin team is pleased to announce the immediate availability of
the 4.0.4 release.

This is a minor release with 5 small improvements.
All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/download/

Apache Kylin is an open-source Distributed Analytical Data Warehouse for
Big Data; it was designed to provide OLAP (Online Analytical Processing)
capability in the big data era. By renovating the multi-dimensional cube
and precalculation technology on Hadoop and Spark, Kylin is able to achieve
near-constant query speed regardless of the ever-growing data volume.
Reducing query latency from minutes to sub-second, Kylin brings online
analytics back to big data.

Apache Kylin lets you query billions of rows at sub-second latency in 3
steps:
1. Identify a Star/Snowflake Schema on Hadoop.
2. Build Cube from the identified tables.
3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC or
RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/


Regards
Yang


退订

2024-01-26 Thread 黄伟晟
退订

回复: 退订

2024-01-26 Thread 王劲松
退订


提示信息:
本邮件(及任何附件)可能含有机密、专有、具有特权或者受法律保护的资料,仅供指定收件人(或负责将资料转交收件人的人士)使用。如您非本邮件指定收件人,则无权阅读、打印、保留、复制、传播本邮件或其任何部分。如您误收本邮件,请立即销毁或从您的系统中删除,并通知寄件人。中国光大银行不保证本电子邮件是安全的、无错误的/或者免除病毒的,因为电子邮件信息可能会在传输过程中被截取、修改、损坏、遗失、延迟/或者变得不完整/或者被病毒感染。因此,中国光大银行及寄件人不对因本邮件内容的任何错误和遗漏而造成的损失或损害承担任何责任。
This message (and any attachments) may contain information that is 
confidential, proprietary, privileged or otherwise protected by law. The 
message is intended solely for the named addressee (or a person responsible for 
delivering it to the addressee). If you are not the intended recipient of this 
message, you are not authorized to read, print, retain, copy or disseminate 
this message or any part of it. If you have received this message in error, 
please destroy the message or delete it from your system immediately and notify 
the sender. CEB cannot guarantee that this e-mail is secure, error free and/or 
virus-free as e-mail messages could be intercepted, altered, corrupted, lost, 
delayed or become incomplete and/or infected by viruses in the course of their 
transmission. CEB and the sender therefore do not accept liability for any loss 
or damage arising from any errors or omissions in the contents of this e-mail.


退订

2024-01-26 Thread liyangd163
退订

Re: Pinot/Kylin/Druid quick comparision

2024-01-17 Thread Nam Đỗ Duy via user
Good morning Xiaoxiang, hope you are well

1. JDBC source is a feature which in development, it will be supported
later.

===

May I know when will the JDBC be available? as well as is there any change
in Kylin 5 release date

Thank you and best regards


On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > -
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > -
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > 
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> 
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > 
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >  > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  >
> > >> >> wrote:
> > 

退订

2024-01-08 Thread 2415370...@qq.com
退订



2415370...@qq.com
 
发件人: lee
发送时间: 2023-12-06 08:33
收件人: user
主题: Re: kylin4.0.3构建数据时报错
退订

2023年12月5日 19:33,李甜彪  写道:

构建时报错,数据在hive中是没有问题的,空数据构建时可以成功,反思有可能是数据问题,自己手写几条数据,构建时又同样的错误,证明不是原来的数据的问题。
页面的看到的报错信息如下:
java.io.IOException: OS command error exit with return code: 1, error message: 
che.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
... 3 more

}
RetryInfo{
   overrideConf : {},
   throwable : java.lang.RuntimeException: Error execute 
org.apache.kylin.engine.spark.job.CubeBuildJob
at 
org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:96)
at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 74.0 failed 4 times, most recent failure: Lost task 0.3 in 
stage 74.0 (TID 186) (store2 executor 20): java.lang.NoClassDefFoundError: 
Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
at 
org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:103)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
at 
org.apache.spark.sql.hive.HadoopTableReader.$anonfun$makeRDDForTable$3(TableReader.scala:136)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2432)
at 

Re: 退订

2023-12-17 Thread lee
退订

> 2023年12月17日 16:48,怪侠一枝梅  写道:
> 
> 退订



????

2023-12-17 Thread ??????????


Re: 退订

2023-12-15 Thread Yanjing Wang
退订

313463...@qq.com <313463...@qq.com> 于2023年12月13日周三 17:01写道:

> 退订
>
> --
> 313463...@qq.com
> 313463...@qq.com
>
> 
>
>
>
> -- 原始邮件 --
> *发件人:* "user" ;
> *发送时间:* 2023年12月13日(星期三) 下午4:54
> *收件人:* "user";
> *抄送:* "dev";
> *主题:* 退订
>
> 退订
>


??????????

2023-12-13 Thread 313463...@qq.com





313463...@qq.com
313463...@qq.com








----
??: 
   "user"   
 

????

2023-12-13 Thread ??????


回复: 退订

2023-12-13 Thread 王劲松
退订

From: guowj
Date: 2023-12-13 16:54
To: user
CC: dev
Subject: 退订
退订


提示信息:
本邮件(及任何附件)可能含有机密、专有、具有特权或者受法律保护的资料,仅供指定收件人(或负责将资料转交收件人的人士)使用。如您非本邮件指定收件人,则无权阅读、打印、保留、复制、传播本邮件或其任何部分。如您误收本邮件,请立即销毁或从您的系统中删除,并通知寄件人。中国光大银行不保证本电子邮件是安全的、无错误的/或者免除病毒的,因为电子邮件信息可能会在传输过程中被截取、修改、损坏、遗失、延迟/或者变得不完整/或者被病毒感染。因此,中国光大银行及寄件人不对因本邮件内容的任何错误和遗漏而造成的损失或损害承担任何责任。
This message (and any attachments) may contain information that is 
confidential, proprietary, privileged or otherwise protected by law. The 
message is intended solely for the named addressee (or a person responsible for 
delivering it to the addressee). If you are not the intended recipient of this 
message, you are not authorized to read, print, retain, copy or disseminate 
this message or any part of it. If you have received this message in error, 
please destroy the message or delete it from your system immediately and notify 
the sender. CEB cannot guarantee that this e-mail is secure, error free and/or 
virus-free as e-mail messages could be intercepted, altered, corrupted, lost, 
delayed or become incomplete and/or infected by viruses in the course of their 
transmission. CEB and the sender therefore do not accept liability for any loss 
or damage arising from any errors or omissions in the contents of this e-mail.


????

2023-12-13 Thread guowj


Re: ACID with Hive/Kylin

2023-12-12 Thread Nam Đỗ Duy via user
Thank you both of you for your valuable information. I will test and revert
soon.

Best regards

On Tue, Dec 12, 2023 at 2:39 PM Xiaoxiang Yu  wrote:

> I don't know GDPR very well. Here is my understanding.
>
> For hive and hdfs, you can consider using these techniques which support
> ACID in Spark and Hive(I recommend first one):
> 1) Delta Lake,
> https://docs.databricks.com/en/security/privacy/gdpr-delta.html
> 2) Hive ACID table, here is a link,
>
> https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/migrate-hive-workloads/topics/hive-acid-migration-regulations.html
>
> For Kylin, there are three places which may store data, index, snapshot,
> dict. The refresh of the snapshot costs
> less time and resources,  while refresh of index/dict much more. Snapshot
> refresh will be triggered automatically
> when you build an index every day.
>
> I think you should consider centralizing user-sensitive columns(email,
> phone, address) in dimension tables,
> and your fact table only has the foreign key(for example, uid) which refers
> to the primary key of dimension tables.
> When you are modeling in Kylin, for these dim tables which contains
> user-sensitive columns, try
>
> 1. set dim tables as snapshot by disable precompute join relation, so these
> columns won't be built into indexes, refer
>
> https://kylin.apache.org/5.0/docs/modeling/model_design/precompute_join_relations
> 2. not create a bitmap measure on these columns, so these columns won't be
> built into dict
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 12, 2023 at 12:11 PM Nam Đỗ Duy 
> wrote:
>
> > Dear Xiaoxiang, Sirs/Madams
> >
> > I face an issue with deleting data of user according to GPDR-like policy
> > which means when user send request to delete their personal data, we need
> > to delete it from all system, that means to delete data:
> >
> > 1- from Kylin index (cube)
> > 2- from Hive
> > 3- from HDFS
> >
> > Have you had the same use-case before, do you have any suggestions to
> > achieve this scenario?
> >
> > Thank you very much and best regards
> >
>


Re: ACID with Hive/Kylin

2023-12-11 Thread Xiaoxiang Yu
I don't know GDPR very well. Here is my understanding.

For hive and hdfs, you can consider using these techniques which support
ACID in Spark and Hive(I recommend first one):
1) Delta Lake,
https://docs.databricks.com/en/security/privacy/gdpr-delta.html
2) Hive ACID table, here is a link,
https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/migrate-hive-workloads/topics/hive-acid-migration-regulations.html

For Kylin, there are three places which may store data, index, snapshot,
dict. The refresh of the snapshot costs
less time and resources,  while refresh of index/dict much more. Snapshot
refresh will be triggered automatically
when you build an index every day.

I think you should consider centralizing user-sensitive columns(email,
phone, address) in dimension tables,
and your fact table only has the foreign key(for example, uid) which refers
to the primary key of dimension tables.
When you are modeling in Kylin, for these dim tables which contains
user-sensitive columns, try

1. set dim tables as snapshot by disable precompute join relation, so these
columns won't be built into indexes, refer
https://kylin.apache.org/5.0/docs/modeling/model_design/precompute_join_relations
2. not create a bitmap measure on these columns, so these columns won't be
built into dict


With warm regard
Xiaoxiang Yu



On Tue, Dec 12, 2023 at 12:11 PM Nam Đỗ Duy  wrote:

> Dear Xiaoxiang, Sirs/Madams
>
> I face an issue with deleting data of user according to GPDR-like policy
> which means when user send request to delete their personal data, we need
> to delete it from all system, that means to delete data:
>
> 1- from Kylin index (cube)
> 2- from Hive
> 3- from HDFS
>
> Have you had the same use-case before, do you have any suggestions to
> achieve this scenario?
>
> Thank you very much and best regards
>


Re: ACID with Hive/Kylin

2023-12-11 Thread ShaoFeng Shi
Hi Nam,

As Kylin is used to store the aggregated data, there should be no PII
information. (if you use Kylin to manage person level data, that is not a
good case).

If you do need to delete certain personal data, refresh the whole index or
some partitions is what we can do.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Nam Đỗ Duy  于2023年12月12日周二 12:11写道:

> Dear Xiaoxiang, Sirs/Madams
>
> I face an issue with deleting data of user according to GPDR-like policy
> which means when user send request to delete their personal data, we need
> to delete it from all system, that means to delete data:
>
> 1- from Kylin index (cube)
> 2- from Hive
> 3- from HDFS
>
> Have you had the same use-case before, do you have any suggestions to
> achieve this scenario?
>
> Thank you very much and best regards
>


ACID with Hive/Kylin

2023-12-11 Thread Nam Đỗ Duy via user
Dear Xiaoxiang, Sirs/Madams

I face an issue with deleting data of user according to GPDR-like policy
which means when user send request to delete their personal data, we need
to delete it from all system, that means to delete data:

1- from Kylin index (cube)
2- from Hive
3- from HDFS

Have you had the same use-case before, do you have any suggestions to
achieve this scenario?

Thank you very much and best regards


Re: Pinot/Kylin/Druid quick comparision

2023-12-10 Thread Nam Đỗ Duy via user
Thank you very much, please kindly start kylin-kerberos document and JDBC
connectivity, we will be actively participating in testing that JDBC when
it is available so please let us know.

Best regards

On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > -
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > -
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > 
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> 
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > 
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >  > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is 

Re: Pinot/Kylin/Druid quick comparision

2023-12-10 Thread Xiaoxiang Yu
1. JDBC source is a feature which in development, it will be supported
later.

2. Kylin supports kerberos now, I will write a doc as soon as possible.
(I will let you know.)

3. I think ranger and Kerberos are not doing the same things, one for
authentication, one for authorization. So they cannot replace each other.
Ranger can integrate with Kerberos, please check ranger's website for
information.


With warm regard
Xiaoxiang Yu



On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang for your reply
>
> -
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> -
> Yes: please answer to help me clear this headache:
>
> 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> not then do we have any work around?
>
> 2. My team is using kerberos for authentication, do you have any
> document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x
>
> 3. Should we use apache ranger instead of kerberos for authentication and
> for security purposes?
>
> Thank you again
>
> On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>
> > I guess the release date should be 2024/01 .
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you very much xiaoxiang, I did the presentation this morning
> already
> >> so there is no time for you to comment. Next time I will send you in
> >> advance. The meeting result was that we will implement both druid and
> >> kylin
> >> in the next couple of projects because of its realtime feature. Hope
> that
> >> kylin will have same feature soon.
> >>
> >> May I ask when will you release kylin 5.0?
> >>
> >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> >>
> >> > Since 2018 there are a lot of new features and code refactor.
> >> > If you like, you can share your ppt to me privately, maybe I can
> >> > give some comments.
> >> >
> >> > Here is the reference of advantages of Kylin since 2018:
> >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > -
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> >
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> >> wrote:
> >> >
> >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> >> Druid in
> >> >> my team.
> >> >>
> >> >> I found this article and would like you to update me the advantages
> of
> >> >> Kylin since 2018 until now (especially with version 5 to be released)
> >> >>
> >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> >> <
> >> >>
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >> >
> >> >>
> >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> >> >>
> >> >> > Thank you very much for your prompt response, I still have several
> >> >> > questions to seek for your help later.
> >> >> >
> >> >> > Best regards and have a good day
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> wrote:
> >> >> >
> >> >> >> Done. Github branch changed to kylin5.
> >> >> >>
> >> >> >> 
> >> >> >> With warm regard
> >> >> >> Xiaoxiang Yu
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> >> wrote:
> >> >> >>
> >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> >> > 
> >> >> >> > With warm regard
> >> >> >> > Xiaoxiang Yu
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>  >> >
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> your
> >> >> >> default
> >> >> >> >> branch. In case people are impressed by the numbers then I hope
> >> to
> >> >> turn
> >> >> >> >> this situation to reverse direction.
> >> >> >> >>
> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> the
> >> >> active
> >> >> >> >>> branch is kylin5.
> >> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >> >>>
> >> >> >> >>> 
> >> >> >> >>> With warm regard
> >> >> >> >>> Xiaoxiang Yu
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> 
> >> >> >> >>> wrote:
> >> >> >> >>>
> >> >> >>  Hi Xiaoxiang, Sirs / Madams
> >> >> >> 
> >> >> >>  Can you see the atttached photo
> >> >> >> 
> >> >> >>  My boss asked that why 

Re: Pinot/Kylin/Druid quick comparision

2023-12-08 Thread Nam Đỗ Duy via user
Thank you Xiaoxiang for your reply

-
Do you have any suggestions/wishes for kylin 5(except real-time feature)?
-
Yes: please answer to help me clear this headache:

1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
not then do we have any work around?

2. My team is using kerberos for authentication, do you have any
document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x

3. Should we use apache ranger instead of kerberos for authentication and
for security purposes?

Thank you again

On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:

> I guess the release date should be 2024/01 .
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy  wrote:
>
>> Thank you very much xiaoxiang, I did the presentation this morning already
>> so there is no time for you to comment. Next time I will send you in
>> advance. The meeting result was that we will implement both druid and
>> kylin
>> in the next couple of projects because of its realtime feature. Hope that
>> kylin will have same feature soon.
>>
>> May I ask when will you release kylin 5.0?
>>
>> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>>
>> > Since 2018 there are a lot of new features and code refactor.
>> > If you like, you can share your ppt to me privately, maybe I can
>> > give some comments.
>> >
>> > Here is the reference of advantages of Kylin since 2018:
>> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > -
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
>> wrote:
>> >
>> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> Druid in
>> >> my team.
>> >>
>> >> I found this article and would like you to update me the advantages of
>> >> Kylin since 2018 until now (especially with version 5 to be released)
>> >>
>> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> >> <
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> >
>> >>
>> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>> >>
>> >> > Thank you very much for your prompt response, I still have several
>> >> > questions to seek for your help later.
>> >> >
>> >> > Best regards and have a good day
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>> >> >
>> >> >> Done. Github branch changed to kylin5.
>> >> >>
>> >> >> 
>> >> >> With warm regard
>> >> >> Xiaoxiang Yu
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
>> wrote:
>> >> >>
>> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> >> > 
>> >> >> > With warm regard
>> >> >> > Xiaoxiang Yu
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy > >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> >> default
>> >> >> >> branch. In case people are impressed by the numbers then I hope
>> to
>> >> turn
>> >> >> >> this situation to reverse direction.
>> >> >> >>
>> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
>> >> wrote:
>> >> >> >>
>> >> >> >>> The default branch is for 4.X which is a maintained branch, the
>> >> active
>> >> >> >>> branch is kylin5.
>> >> >> >>> I will change the default branch to kylin5 later.
>> >> >> >>>
>> >> >> >>> 
>> >> >> >>> With warm regard
>> >> >> >>> Xiaoxiang Yu
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> 
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >>  Hi Xiaoxiang, Sirs / Madams
>> >> >> 
>> >> >>  Can you see the atttached photo
>> >> >> 
>> >> >>  My boss asked that why druid commit code regularly but kylin
>> had
>> >> not
>> >> >>  been committed since July
>> >> >> 
>> >> >> 
>> >> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu 
>> wrote:
>> >> >> 
>> >> >> > I think so.
>> >> >> >
>> >> >> > Response time is not the only factor to make a decision. Kylin
>> >> could
>> >> >> > be cheaper
>> >> >> > when the query pattern is suitable for the Kylin model, and
>> Kylin
>> >> >> can
>> >> >> > guarantee
>> >> >> > reasonable query latency. Clickhouse will be quicker in an ad
>> hoc
>> >> >> > query scenario.
>> >> >> >
>> >> >> > By the way, Youzan and Kyligence combine them together to
>> provide
>> >> >> > unified data analytics services for their customers.
>> >> >> >
>> >> >> > 

Re: Pinot/Kylin/Druid quick comparision

2023-12-07 Thread Xiaoxiang Yu
I guess the release date should be 2024/01 .
Do you have any suggestions/wishes for kylin 5(except real-time feature)?


With warm regard
Xiaoxiang Yu



On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy  wrote:

> Thank you very much xiaoxiang, I did the presentation this morning already
> so there is no time for you to comment. Next time I will send you in
> advance. The meeting result was that we will implement both druid and kylin
> in the next couple of projects because of its realtime feature. Hope that
> kylin will have same feature soon.
>
> May I ask when will you release kylin 5.0?
>
> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>
> > Since 2018 there are a lot of new features and code refactor.
> > If you like, you can share your ppt to me privately, maybe I can
> > give some comments.
> >
> > Here is the reference of advantages of Kylin since 2018:
> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > -
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> wrote:
> >
> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid
> in
> >> my team.
> >>
> >> I found this article and would like you to update me the advantages of
> >> Kylin since 2018 until now (especially with version 5 to be released)
> >>
> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> <
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >
> >>
> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> >>
> >> > Thank you very much for your prompt response, I still have several
> >> > questions to seek for your help later.
> >> >
> >> > Best regards and have a good day
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
> >> >
> >> >> Done. Github branch changed to kylin5.
> >> >>
> >> >> 
> >> >> With warm regard
> >> >> Xiaoxiang Yu
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> wrote:
> >> >>
> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> > 
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  >
> >> >> wrote:
> >> >> >
> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> >> default
> >> >> >> branch. In case people are impressed by the numbers then I hope to
> >> turn
> >> >> >> this situation to reverse direction.
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
> >> wrote:
> >> >> >>
> >> >> >>> The default branch is for 4.X which is a maintained branch, the
> >> active
> >> >> >>> branch is kylin5.
> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >>>
> >> >> >>> 
> >> >> >>> With warm regard
> >> >> >>> Xiaoxiang Yu
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy  >
> >> >> >>> wrote:
> >> >> >>>
> >> >>  Hi Xiaoxiang, Sirs / Madams
> >> >> 
> >> >>  Can you see the atttached photo
> >> >> 
> >> >>  My boss asked that why druid commit code regularly but kylin had
> >> not
> >> >>  been committed since July
> >> >> 
> >> >> 
> >> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu 
> wrote:
> >> >> 
> >> >> > I think so.
> >> >> >
> >> >> > Response time is not the only factor to make a decision. Kylin
> >> could
> >> >> > be cheaper
> >> >> > when the query pattern is suitable for the Kylin model, and
> Kylin
> >> >> can
> >> >> > guarantee
> >> >> > reasonable query latency. Clickhouse will be quicker in an ad
> hoc
> >> >> > query scenario.
> >> >> >
> >> >> > By the way, Youzan and Kyligence combine them together to
> provide
> >> >> > unified data analytics services for their customers.
> >> >> >
> >> >> > 
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>  >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Hi Xiaoxiang, thank you
> >> >> >>
> >> >> >> In case my client uses cloud computing service like gcp or
> aws,
> >> >> which
> >> >> >> will cost more: precalculation feature of kylin or clickhouse
> >> >> (incase
> >> >> >> of
> >> >> >> kylin, I have a thought that the query execution has been done
> >> once
> >> >> >> and
> >> >> >> stored in cube to be used many times so kylin uses less cloud
> >> >> >> computation,
> >> >> >> is that true)?
> >> >> >>
> >> >> >> On Mon, Dec 4, 

Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Nam Đỗ Duy via user
Thank you very much xiaoxiang, I did the presentation this morning already
so there is no time for you to comment. Next time I will send you in
advance. The meeting result was that we will implement both druid and kylin
in the next couple of projects because of its realtime feature. Hope that
kylin will have same feature soon.

May I ask when will you release kylin 5.0?

On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:

> Since 2018 there are a lot of new features and code refactor.
> If you like, you can share your ppt to me privately, maybe I can
> give some comments.
>
> Here is the reference of advantages of Kylin since 2018:
> - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> -
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> - https://kylin.apache.org/5.0/docs/development/roadmap
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  wrote:
>
>> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
>> my team.
>>
>> I found this article and would like you to update me the advantages of
>> Kylin since 2018 until now (especially with version 5 to be released)
>>
>> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> <
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >
>>
>> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>>
>> > Thank you very much for your prompt response, I still have several
>> > questions to seek for your help later.
>> >
>> > Best regards and have a good day
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>> >
>> >> Done. Github branch changed to kylin5.
>> >>
>> >> 
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>> >>
>> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > 
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
>> >> wrote:
>> >> >
>> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> default
>> >> >> branch. In case people are impressed by the numbers then I hope to
>> turn
>> >> >> this situation to reverse direction.
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
>> wrote:
>> >> >>
>> >> >>> The default branch is for 4.X which is a maintained branch, the
>> active
>> >> >>> branch is kylin5.
>> >> >>> I will change the default branch to kylin5 later.
>> >> >>>
>> >> >>> 
>> >> >>> With warm regard
>> >> >>> Xiaoxiang Yu
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>> >> >>> wrote:
>> >> >>>
>> >>  Hi Xiaoxiang, Sirs / Madams
>> >> 
>> >>  Can you see the atttached photo
>> >> 
>> >>  My boss asked that why druid commit code regularly but kylin had
>> not
>> >>  been committed since July
>> >> 
>> >> 
>> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>> >> 
>> >> > I think so.
>> >> >
>> >> > Response time is not the only factor to make a decision. Kylin
>> could
>> >> > be cheaper
>> >> > when the query pattern is suitable for the Kylin model, and Kylin
>> >> can
>> >> > guarantee
>> >> > reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >> > query scenario.
>> >> >
>> >> > By the way, Youzan and Kyligence combine them together to provide
>> >> > unified data analytics services for their customers.
>> >> >
>> >> > 
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy > >
>> >> > wrote:
>> >> >
>> >> >> Hi Xiaoxiang, thank you
>> >> >>
>> >> >> In case my client uses cloud computing service like gcp or aws,
>> >> which
>> >> >> will cost more: precalculation feature of kylin or clickhouse
>> >> (incase
>> >> >> of
>> >> >> kylin, I have a thought that the query execution has been done
>> once
>> >> >> and
>> >> >> stored in cube to be used many times so kylin uses less cloud
>> >> >> computation,
>> >> >> is that true)?
>> >> >>
>> >> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
>> >> wrote:
>> >> >>
>> >> >> > Following text is part of an article(
>> >> >> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> ===
>> >> >> >
>> >> >> > Kylin is suitable for aggregation queries with fixed modes
>> >> because
>> >> >> of its
>> >> >> > pre-calculated technology, for example, join, group by, and
>> where
>> >> >> 

confirm unsubscribe

2023-12-06 Thread chinacsj


confirm unsubscribe












chinacsj


15180809...@139.com


15180809092







电子名片新出VIP模板啦,快来体验>>




扫一扫,


快速添加名片到手机





Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Xiaoxiang Yu
Since 2018 there are a lot of new features and code refactor.
If you like, you can share your ppt to me privately, maybe I can
give some comments.

Here is the reference of advantages of Kylin since 2018:
- https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
-
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
- https://kylin.apache.org/5.0/docs/development/roadmap


With warm regard
Xiaoxiang Yu



On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  wrote:

> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
> my team.
>
> I found this article and would like you to update me the advantages of
> Kylin since 2018 until now (especially with version 5 to be released)
>
> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> <
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >
>
> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>
> > Thank you very much for your prompt response, I still have several
> > questions to seek for your help later.
> >
> > Best regards and have a good day
> >
> >
> >
> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
> >
> >> Done. Github branch changed to kylin5.
> >>
> >> 
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
> >>
> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
> >> wrote:
> >> >
> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> default
> >> >> branch. In case people are impressed by the numbers then I hope to
> turn
> >> >> this situation to reverse direction.
> >> >>
> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
> >> >>
> >> >>> The default branch is for 4.X which is a maintained branch, the
> active
> >> >>> branch is kylin5.
> >> >>> I will change the default branch to kylin5 later.
> >> >>>
> >> >>> 
> >> >>> With warm regard
> >> >>> Xiaoxiang Yu
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
> >> >>> wrote:
> >> >>>
> >>  Hi Xiaoxiang, Sirs / Madams
> >> 
> >>  Can you see the atttached photo
> >> 
> >>  My boss asked that why druid commit code regularly but kylin had
> not
> >>  been committed since July
> >> 
> >> 
> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
> >> 
> >> > I think so.
> >> >
> >> > Response time is not the only factor to make a decision. Kylin
> could
> >> > be cheaper
> >> > when the query pattern is suitable for the Kylin model, and Kylin
> >> can
> >> > guarantee
> >> > reasonable query latency. Clickhouse will be quicker in an ad hoc
> >> > query scenario.
> >> >
> >> > By the way, Youzan and Kyligence combine them together to provide
> >> > unified data analytics services for their customers.
> >> >
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy  >
> >> > wrote:
> >> >
> >> >> Hi Xiaoxiang, thank you
> >> >>
> >> >> In case my client uses cloud computing service like gcp or aws,
> >> which
> >> >> will cost more: precalculation feature of kylin or clickhouse
> >> (incase
> >> >> of
> >> >> kylin, I have a thought that the query execution has been done
> once
> >> >> and
> >> >> stored in cube to be used many times so kylin uses less cloud
> >> >> computation,
> >> >> is that true)?
> >> >>
> >> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
> >> wrote:
> >> >>
> >> >> > Following text is part of an article(
> >> >> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> ===
> >> >> >
> >> >> > Kylin is suitable for aggregation queries with fixed modes
> >> because
> >> >> of its
> >> >> > pre-calculated technology, for example, join, group by, and
> where
> >> >> condition
> >> >> > modes in SQL are relatively fixed, etc. The larger the data
> >> volume
> >> >> is, the
> >> >> > more obvious the advantages of using Kylin are; in particular,
> >> >> Kylin is
> >> >> > particularly advantageous in the scenarios of de-emphasis
> (count
> >> >> distinct),
> >> >> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >> de-weighting
> >> >> > (count distinct), Top N, Percentile and other scenarios are
> >> >> especially
> >> >> > huge, and it is used in a large number of scenarios, such as
> >> >> 

confirm unsubscribe

2023-12-06 Thread chinacsj


confirm unsubscribe












chinacsj


15180809...@139.com


15180809092







电子名片新出VIP模板啦,快来体验>>




扫一扫,


快速添加名片到手机





Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Nam Đỗ Duy via user
Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
my team.

I found this article and would like you to update me the advantages of
Kylin since 2018 until now (especially with version 5 to be released)

Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?


On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:

> Thank you very much for your prompt response, I still have several
> questions to seek for your help later.
>
> Best regards and have a good day
>
>
>
> On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>
>> Done. Github branch changed to kylin5.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>>
>> > A JIRA ticket has been opened, waiting for INFRA :
>> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
>> wrote:
>> >
>> >> Thank you Xiaoxiang, please update me when you have changed your
>> default
>> >> branch. In case people are impressed by the numbers then I hope to turn
>> >> this situation to reverse direction.
>> >>
>> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>> >>
>> >>> The default branch is for 4.X which is a maintained branch, the active
>> >>> branch is kylin5.
>> >>> I will change the default branch to kylin5 later.
>> >>>
>> >>> 
>> >>> With warm regard
>> >>> Xiaoxiang Yu
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>> >>> wrote:
>> >>>
>>  Hi Xiaoxiang, Sirs / Madams
>> 
>>  Can you see the atttached photo
>> 
>>  My boss asked that why druid commit code regularly but kylin had not
>>  been committed since July
>> 
>> 
>>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>> 
>> > I think so.
>> >
>> > Response time is not the only factor to make a decision. Kylin could
>> > be cheaper
>> > when the query pattern is suitable for the Kylin model, and Kylin
>> can
>> > guarantee
>> > reasonable query latency. Clickhouse will be quicker in an ad hoc
>> > query scenario.
>> >
>> > By the way, Youzan and Kyligence combine them together to provide
>> > unified data analytics services for their customers.
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
>> > wrote:
>> >
>> >> Hi Xiaoxiang, thank you
>> >>
>> >> In case my client uses cloud computing service like gcp or aws,
>> which
>> >> will cost more: precalculation feature of kylin or clickhouse
>> (incase
>> >> of
>> >> kylin, I have a thought that the query execution has been done once
>> >> and
>> >> stored in cube to be used many times so kylin uses less cloud
>> >> computation,
>> >> is that true)?
>> >>
>> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
>> wrote:
>> >>
>> >> > Following text is part of an article(
>> >> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >
>> >> >
>> >> >
>> >>
>> ===
>> >> >
>> >> > Kylin is suitable for aggregation queries with fixed modes
>> because
>> >> of its
>> >> > pre-calculated technology, for example, join, group by, and where
>> >> condition
>> >> > modes in SQL are relatively fixed, etc. The larger the data
>> volume
>> >> is, the
>> >> > more obvious the advantages of using Kylin are; in particular,
>> >> Kylin is
>> >> > particularly advantageous in the scenarios of de-emphasis (count
>> >> distinct),
>> >> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> de-weighting
>> >> > (count distinct), Top N, Percentile and other scenarios are
>> >> especially
>> >> > huge, and it is used in a large number of scenarios, such as
>> >> Dashboard, all
>> >> > kinds of reports, large-screen display, traffic statistics, and
>> user
>> >> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> >> to build
>> >> > their data service platforms, providing millions to tens of
>> >> millions of
>> >> > queries per day, and most of the queries can be completed within
>> 2
>> >> - 3
>> >> > seconds. There is no better alternative for such a high
>> concurrency
>> >> > scenario.
>> >> >
>> >> > ClickHouse, because of its MPP architecture, has high computing
>> >> power and
>> >> > is more suitable when the query request is more flexible, or when
>> >> there is
>> >> > a need for detailed queries with low concurrency. Scenarios
>> >> include: very
>> >> > many columns and 

Re: Pinot/Kylin/Druid quick comparision

2023-12-05 Thread Nam Đỗ Duy via user
Thank you very much for your prompt response, I still have several
questions to seek for your help later.

Best regards and have a good day



On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:

> Done. Github branch changed to kylin5.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>
> > A JIRA ticket has been opened, waiting for INFRA :
> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang, please update me when you have changed your default
> >> branch. In case people are impressed by the numbers then I hope to turn
> >> this situation to reverse direction.
> >>
> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
> >>
> >>> The default branch is for 4.X which is a maintained branch, the active
> >>> branch is kylin5.
> >>> I will change the default branch to kylin5 later.
> >>>
> >>> 
> >>> With warm regard
> >>> Xiaoxiang Yu
> >>>
> >>>
> >>>
> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
> >>> wrote:
> >>>
>  Hi Xiaoxiang, Sirs / Madams
> 
>  Can you see the atttached photo
> 
>  My boss asked that why druid commit code regularly but kylin had not
>  been committed since July
> 
> 
>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
> 
> > I think so.
> >
> > Response time is not the only factor to make a decision. Kylin could
> > be cheaper
> > when the query pattern is suitable for the Kylin model, and Kylin can
> > guarantee
> > reasonable query latency. Clickhouse will be quicker in an ad hoc
> > query scenario.
> >
> > By the way, Youzan and Kyligence combine them together to provide
> > unified data analytics services for their customers.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
> > wrote:
> >
> >> Hi Xiaoxiang, thank you
> >>
> >> In case my client uses cloud computing service like gcp or aws,
> which
> >> will cost more: precalculation feature of kylin or clickhouse
> (incase
> >> of
> >> kylin, I have a thought that the query execution has been done once
> >> and
> >> stored in cube to be used many times so kylin uses less cloud
> >> computation,
> >> is that true)?
> >>
> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
> wrote:
> >>
> >> > Following text is part of an article(
> >> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >
> >> >
> >> >
> >>
> ===
> >> >
> >> > Kylin is suitable for aggregation queries with fixed modes because
> >> of its
> >> > pre-calculated technology, for example, join, group by, and where
> >> condition
> >> > modes in SQL are relatively fixed, etc. The larger the data volume
> >> is, the
> >> > more obvious the advantages of using Kylin are; in particular,
> >> Kylin is
> >> > particularly advantageous in the scenarios of de-emphasis (count
> >> distinct),
> >> > Top N, and Percentile. In particular, Kylin's advantages in
> >> de-weighting
> >> > (count distinct), Top N, Percentile and other scenarios are
> >> especially
> >> > huge, and it is used in a large number of scenarios, such as
> >> Dashboard, all
> >> > kinds of reports, large-screen display, traffic statistics, and
> user
> >> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
> >> to build
> >> > their data service platforms, providing millions to tens of
> >> millions of
> >> > queries per day, and most of the queries can be completed within 2
> >> - 3
> >> > seconds. There is no better alternative for such a high
> concurrency
> >> > scenario.
> >> >
> >> > ClickHouse, because of its MPP architecture, has high computing
> >> power and
> >> > is more suitable when the query request is more flexible, or when
> >> there is
> >> > a need for detailed queries with low concurrency. Scenarios
> >> include: very
> >> > many columns and where conditions are arbitrarily combined with
> the
> >> user
> >> > label filtering, not a large amount of concurrency of complex
> >> on-the-spot
> >> > query and so on. If the amount of data and access is large, you
> >> need to
> >> > deploy a distributed ClickHouse cluster, which is a higher
> >> challenge for
> >> > operation and maintenance.
> >> >
> >> > If some queries are very flexible but infrequent, it is more
> >> > resource-efficient to use now-computing. Since the number of
> >> queries is
> >> > small, even if each query 

Re: Pinot/Kylin/Druid quick comparision

2023-12-05 Thread Xiaoxiang Yu
Done. Github branch changed to kylin5.


With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:

> A JIRA ticket has been opened, waiting for INFRA :
> https://issues.apache.org/jira/browse/INFRA-25238 .
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  wrote:
>
>> Thank you Xiaoxiang, please update me when you have changed your default
>> branch. In case people are impressed by the numbers then I hope to turn
>> this situation to reverse direction.
>>
>> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>>
>>> The default branch is for 4.X which is a maintained branch, the active
>>> branch is kylin5.
>>> I will change the default branch to kylin5 later.
>>>
>>> 
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>>> wrote:
>>>
 Hi Xiaoxiang, Sirs / Madams

 Can you see the atttached photo

 My boss asked that why druid commit code regularly but kylin had not
 been committed since July


 On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:

> I think so.
>
> Response time is not the only factor to make a decision. Kylin could
> be cheaper
> when the query pattern is suitable for the Kylin model, and Kylin can
> guarantee
> reasonable query latency. Clickhouse will be quicker in an ad hoc
> query scenario.
>
> By the way, Youzan and Kyligence combine them together to provide
> unified data analytics services for their customers.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
> wrote:
>
>> Hi Xiaoxiang, thank you
>>
>> In case my client uses cloud computing service like gcp or aws, which
>> will cost more: precalculation feature of kylin or clickhouse (incase
>> of
>> kylin, I have a thought that the query execution has been done once
>> and
>> stored in cube to be used many times so kylin uses less cloud
>> computation,
>> is that true)?
>>
>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>>
>> > Following text is part of an article(
>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >
>> >
>> >
>> ===
>> >
>> > Kylin is suitable for aggregation queries with fixed modes because
>> of its
>> > pre-calculated technology, for example, join, group by, and where
>> condition
>> > modes in SQL are relatively fixed, etc. The larger the data volume
>> is, the
>> > more obvious the advantages of using Kylin are; in particular,
>> Kylin is
>> > particularly advantageous in the scenarios of de-emphasis (count
>> distinct),
>> > Top N, and Percentile. In particular, Kylin's advantages in
>> de-weighting
>> > (count distinct), Top N, Percentile and other scenarios are
>> especially
>> > huge, and it is used in a large number of scenarios, such as
>> Dashboard, all
>> > kinds of reports, large-screen display, traffic statistics, and user
>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> to build
>> > their data service platforms, providing millions to tens of
>> millions of
>> > queries per day, and most of the queries can be completed within 2
>> - 3
>> > seconds. There is no better alternative for such a high concurrency
>> > scenario.
>> >
>> > ClickHouse, because of its MPP architecture, has high computing
>> power and
>> > is more suitable when the query request is more flexible, or when
>> there is
>> > a need for detailed queries with low concurrency. Scenarios
>> include: very
>> > many columns and where conditions are arbitrarily combined with the
>> user
>> > label filtering, not a large amount of concurrency of complex
>> on-the-spot
>> > query and so on. If the amount of data and access is large, you
>> need to
>> > deploy a distributed ClickHouse cluster, which is a higher
>> challenge for
>> > operation and maintenance.
>> >
>> > If some queries are very flexible but infrequent, it is more
>> > resource-efficient to use now-computing. Since the number of
>> queries is
>> > small, even if each query consumes a lot of computational
>> resources, it is
>> > still cost-effective overall. If some queries have a fixed pattern
>> and the
>> > query volume is large, it is more suitable for Kylin, because the
>> query
>> > volume is large, and by using large computational resources to save
>> the
>> > results, the upfront computational cost can be amortized over each
>> query,
>> > so it is the most economical.
>> >
>> > --- 

Re: kylin4.0.3构建数据时报错

2023-12-05 Thread lee
退订

> 2023年12月5日 19:33,李甜彪  写道:
> 
> 构建时报错,数据在hive中是没有问题的,空数据构建时可以成功,反思有可能是数据问题,自己手写几条数据,构建时又同样的错误,证明不是原来的数据的问题。
> 页面的看到的报错信息如下:
> java.io.IOException: OS command error exit with return code: 1, error 
> message: che.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
> ... 3 more
> 
> }
> RetryInfo{
>overrideConf : {},
>throwable : java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.CubeBuildJob
> at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:96)
> at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 74.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 74.0 (TID 186) (store2 executor 20): java.lang.NoClassDefFoundError: 
> Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:103)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
> at 
> org.apache.spark.sql.hive.HadoopTableReader.$anonfun$makeRDDForTable$3(TableReader.scala:136)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)
> at 
> 

kylin4.0.3构建数据时报错

2023-12-05 Thread 李甜彪
构建时报错,数据在hive中是没有问题的,空数据构建时可以成功,反思有可能是数据问题,自己手写几条数据,构建时又同样的错误,证明不是原来的数据的问题。
页面的看到的报错信息如下:
java.io.IOException: OS command error exit with return code: 1, error message: 
che.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
... 3 more

}
RetryInfo{
   overrideConf : {},
   throwable : java.lang.RuntimeException: Error execute 
org.apache.kylin.engine.spark.job.CubeBuildJob
at 
org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:96)
at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 74.0 failed 4 times, most recent failure: Lost task 0.3 in 
stage 74.0 (TID 186) (store2 executor 20): java.lang.NoClassDefFoundError: 
Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
at 
org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:103)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
at 
org.apache.spark.sql.hive.HadoopTableReader.$anonfun$makeRDDForTable$3(TableReader.scala:136)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2432)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2421)
at 

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Xiaoxiang Yu
A JIRA ticket has been opened, waiting for INFRA :
https://issues.apache.org/jira/browse/INFRA-25238 .

With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang, please update me when you have changed your default
> branch. In case people are impressed by the numbers then I hope to turn
> this situation to reverse direction.
>
> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>
>> The default branch is for 4.X which is a maintained branch, the active
>> branch is kylin5.
>> I will change the default branch to kylin5 later.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy  wrote:
>>
>>> Hi Xiaoxiang, Sirs / Madams
>>>
>>> Can you see the atttached photo
>>>
>>> My boss asked that why druid commit code regularly but kylin had not
>>> been committed since July
>>>
>>>
>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>>>
 I think so.

 Response time is not the only factor to make a decision. Kylin could be
 cheaper
 when the query pattern is suitable for the Kylin model, and Kylin can
 guarantee
 reasonable query latency. Clickhouse will be quicker in an ad hoc query
 scenario.

 By the way, Youzan and Kyligence combine them together to provide
 unified data analytics services for their customers.

 
 With warm regard
 Xiaoxiang Yu



 On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
 wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase
> of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud
> computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===
> >
> > Kylin is suitable for aggregation queries with fixed modes because
> of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume
> is, the
> > more obvious the advantages of using Kylin are; in particular, Kylin
> is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in
> de-weighting
> > (count distinct), Top N, Percentile and other scenarios are
> especially
> > huge, and it is used in a large number of scenarios, such as
> Dashboard, all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions
> of
> > queries per day, and most of the queries can be completed within 2 -
> 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing
> power and
> > is more suitable when the query request is more flexible, or when
> there is
> > a need for detailed queries with low concurrency. Scenarios include:
> very
> > many columns and where conditions are arbitrarily combined with the
> user
> > label filtering, not a large amount of concurrency of complex
> on-the-spot
> > query and so on. If the amount of data and access is large, you need
> to
> > deploy a distributed ClickHouse cluster, which is a higher challenge
> for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries
> is
> > small, even if each query consumes a lot of computational resources,
> it is
> > still cost-effective overall. If some queries have a fixed pattern
> and the
> > query volume is large, it is more suitable for Kylin, because the
> query
> > volume is large, and by using large computational resources to save
> the
> > results, the upfront computational cost can be amortized over each
> query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Xiaoxiang Yu
I think so.

Response time is not the only factor to make a decision. Kylin could be
cheaper
when the query pattern is suitable for the Kylin model, and Kylin can
guarantee
reasonable query latency. Clickhouse will be quicker in an ad hoc query
scenario.

By the way, Youzan and Kyligence combine them together to provide
unified data analytics services for their customers.


With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy  wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===
> >
> > Kylin is suitable for aggregation queries with fixed modes because of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume is,
> the
> > more obvious the advantages of using Kylin are; in particular, Kylin is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> > (count distinct), Top N, Percentile and other scenarios are especially
> > huge, and it is used in a large number of scenarios, such as Dashboard,
> all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions of
> > queries per day, and most of the queries can be completed within 2 - 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing power and
> > is more suitable when the query request is more flexible, or when there
> is
> > a need for detailed queries with low concurrency. Scenarios include: very
> > many columns and where conditions are arbitrarily combined with the user
> > label filtering, not a large amount of concurrency of complex on-the-spot
> > query and so on. If the amount of data and access is large, you need to
> > deploy a distributed ClickHouse cluster, which is a higher challenge for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries is
> > small, even if each query consumes a lot of computational resources, it
> is
> > still cost-effective overall. If some queries have a fixed pattern and
> the
> > query volume is large, it is more suitable for Kylin, because the query
> > volume is large, and by using large computational resources to save the
> > results, the upfront computational cost can be amortized over each query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's
> >> great.
> >>
> >> This morning there has been a new challenge to my team: clickhouse
> offered
> >> us the speed of calculating 8 billion rows in millisecond which is
> faster
> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
> >> 2.9
> >> seconds)
> >>
> >> Can you briefly suggest the advantages of kylin over clickhouse so that
> I
> >> can defend my demonstration.
> >>
> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
> >>
> >> > 1. "In this important scenario of realtime analytics, the reason here
> is
> >> > that
> >> > kylin has lag time due to model update of new segment build, is that
> >> > correct?"
> >> >
> >> > You are correct.
> >> >
> >> > 2. "If that is true, then can you suggest a work-around of combination
> >> of
> >> > ... "
> >> >
> >> > Kylin is planning to introduce NRT streaming(coding is completed but
> not
> >> > released),
> >> > which can make the time-lag to about 3 minutes(that is my estimation
> >> but I
> >> > am
> >> > quite certain about it).
> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
> >> > aggregation and persistence periodically. The price is that you need
> to
> >> run
> >> > and monitor a long-running
> >> >  job. This feature is based on Spark Streaming, so you need knowledge
> of
> >> > it.
> >> >
> >> > I am curious about what is the maximum time-lag your customers
> >> > can tolerate?
> >> > Personally, I guess minute level time-lag is ok for 

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Nam Đỗ Duy via user
Hi Xiaoxiang, thank you

In case my client uses cloud computing service like gcp or aws, which
will cost more: precalculation feature of kylin or clickhouse (incase of
kylin, I have a thought that the query execution has been done once and
stored in cube to be used many times so kylin uses less cloud computation,
is that true)?

On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:

> Following text is part of an article(
> https://zhuanlan.zhihu.com/p/343394287) .
>
>
> ===
>
> Kylin is suitable for aggregation queries with fixed modes because of its
> pre-calculated technology, for example, join, group by, and where condition
> modes in SQL are relatively fixed, etc. The larger the data volume is, the
> more obvious the advantages of using Kylin are; in particular, Kylin is
> particularly advantageous in the scenarios of de-emphasis (count distinct),
> Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> (count distinct), Top N, Percentile and other scenarios are especially
> huge, and it is used in a large number of scenarios, such as Dashboard, all
> kinds of reports, large-screen display, traffic statistics, and user
> behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
> their data service platforms, providing millions to tens of millions of
> queries per day, and most of the queries can be completed within 2 - 3
> seconds. There is no better alternative for such a high concurrency
> scenario.
>
> ClickHouse, because of its MPP architecture, has high computing power and
> is more suitable when the query request is more flexible, or when there is
> a need for detailed queries with low concurrency. Scenarios include: very
> many columns and where conditions are arbitrarily combined with the user
> label filtering, not a large amount of concurrency of complex on-the-spot
> query and so on. If the amount of data and access is large, you need to
> deploy a distributed ClickHouse cluster, which is a higher challenge for
> operation and maintenance.
>
> If some queries are very flexible but infrequent, it is more
> resource-efficient to use now-computing. Since the number of queries is
> small, even if each query consumes a lot of computational resources, it is
> still cost-effective overall. If some queries have a fixed pattern and the
> query volume is large, it is more suitable for Kylin, because the query
> volume is large, and by using large computational resources to save the
> results, the upfront computational cost can be amortized over each query,
> so it is the most economical.
>
> --- Translated with DeepL.com (free version)
>
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy  wrote:
>
>> Thank you Xiaoxiang for the near real time streaming feature. That's
>> great.
>>
>> This morning there has been a new challenge to my team: clickhouse offered
>> us the speed of calculating 8 billion rows in millisecond which is faster
>> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> 2.9
>> seconds)
>>
>> Can you briefly suggest the advantages of kylin over clickhouse so that I
>> can defend my demonstration.
>>
>> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
>>
>> > 1. "In this important scenario of realtime analytics, the reason here is
>> > that
>> > kylin has lag time due to model update of new segment build, is that
>> > correct?"
>> >
>> > You are correct.
>> >
>> > 2. "If that is true, then can you suggest a work-around of combination
>> of
>> > ... "
>> >
>> > Kylin is planning to introduce NRT streaming(coding is completed but not
>> > released),
>> > which can make the time-lag to about 3 minutes(that is my estimation
>> but I
>> > am
>> > quite certain about it).
>> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> > aggregation and persistence periodically. The price is that you need to
>> run
>> > and monitor a long-running
>> >  job. This feature is based on Spark Streaming, so you need knowledge of
>> > it.
>> >
>> > I am curious about what is the maximum time-lag your customers
>> > can tolerate?
>> > Personally, I guess minute level time-lag is ok for most cases.
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy 
>> wrote:
>> >
>> > > Druid is better in
>> > > - Have a real-time datasource like Kafka etc.
>> > >
>> > > ==
>> > >
>> > > Hi Xiaoxiang, thank you for your response.
>> > >
>> > > In this important scenario of realtime alalytics, the reason here is
>> that
>> > > kylin has lag time due to model update of new segment build, is that
>> > > correct?
>> > >
>> > > If that is true, then can you suggest a work-around of combination of
>> :
>> > >
>> > > (time - lag kylin cube) + (realtime DB update) to provide
>> > > realtime capability ?

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Xiaoxiang Yu
Following text is part of an article(https://zhuanlan.zhihu.com/p/343394287)
.

===

Kylin is suitable for aggregation queries with fixed modes because of its
pre-calculated technology, for example, join, group by, and where condition
modes in SQL are relatively fixed, etc. The larger the data volume is, the
more obvious the advantages of using Kylin are; in particular, Kylin is
particularly advantageous in the scenarios of de-emphasis (count distinct),
Top N, and Percentile. In particular, Kylin's advantages in de-weighting
(count distinct), Top N, Percentile and other scenarios are especially
huge, and it is used in a large number of scenarios, such as Dashboard, all
kinds of reports, large-screen display, traffic statistics, and user
behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
their data service platforms, providing millions to tens of millions of
queries per day, and most of the queries can be completed within 2 - 3
seconds. There is no better alternative for such a high concurrency
scenario.

ClickHouse, because of its MPP architecture, has high computing power and
is more suitable when the query request is more flexible, or when there is
a need for detailed queries with low concurrency. Scenarios include: very
many columns and where conditions are arbitrarily combined with the user
label filtering, not a large amount of concurrency of complex on-the-spot
query and so on. If the amount of data and access is large, you need to
deploy a distributed ClickHouse cluster, which is a higher challenge for
operation and maintenance.

If some queries are very flexible but infrequent, it is more
resource-efficient to use now-computing. Since the number of queries is
small, even if each query consumes a lot of computational resources, it is
still cost-effective overall. If some queries have a fixed pattern and the
query volume is large, it is more suitable for Kylin, because the query
volume is large, and by using large computational resources to save the
results, the upfront computational cost can be amortized over each query,
so it is the most economical.

--- Translated with DeepL.com (free version)



With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang for the near real time streaming feature. That's great.
>
> This morning there has been a new challenge to my team: clickhouse offered
> us the speed of calculating 8 billion rows in millisecond which is faster
> than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
> seconds)
>
> Can you briefly suggest the advantages of kylin over clickhouse so that I
> can defend my demonstration.
>
> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
>
> > 1. "In this important scenario of realtime analytics, the reason here is
> > that
> > kylin has lag time due to model update of new segment build, is that
> > correct?"
> >
> > You are correct.
> >
> > 2. "If that is true, then can you suggest a work-around of combination of
> > ... "
> >
> > Kylin is planning to introduce NRT streaming(coding is completed but not
> > released),
> > which can make the time-lag to about 3 minutes(that is my estimation but
> I
> > am
> > quite certain about it).
> > NRT stands for 'near real-time', it will run a job and do micro-batch
> > aggregation and persistence periodically. The price is that you need to
> run
> > and monitor a long-running
> >  job. This feature is based on Spark Streaming, so you need knowledge of
> > it.
> >
> > I am curious about what is the maximum time-lag your customers
> > can tolerate?
> > Personally, I guess minute level time-lag is ok for most cases.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy 
> wrote:
> >
> > > Druid is better in
> > > - Have a real-time datasource like Kafka etc.
> > >
> > > ==
> > >
> > > Hi Xiaoxiang, thank you for your response.
> > >
> > > In this important scenario of realtime alalytics, the reason here is
> that
> > > kylin has lag time due to model update of new segment build, is that
> > > correct?
> > >
> > > If that is true, then can you suggest a work-around of combination of :
> > >
> > > (time - lag kylin cube) + (realtime DB update) to provide
> > > realtime capability ?
> > >
> > > IMO, the point here is to find that (realtime DB update) and integrate
> it
> > > with (time - lag kylin cube).
> > >
> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
> > >
> > > > I researched and tested Druid two years ago(I don't know too much
> about
> > > >  the change of Druid in these two years. New features that I know
> are :
> > > > new UI, fully on K8s etc).
> > > >
> > > > Here are some cases you should consider using Druid other than Kylin
> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I 

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Nam Đỗ Duy via user
Thank you Xiaoxiang for the near real time streaming feature. That's great.

This morning there has been a new challenge to my team: clickhouse offered
us the speed of calculating 8 billion rows in millisecond which is faster
than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
seconds)

Can you briefly suggest the advantages of kylin over clickhouse so that I
can defend my demonstration.

On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:

> 1. "In this important scenario of realtime analytics, the reason here is
> that
> kylin has lag time due to model update of new segment build, is that
> correct?"
>
> You are correct.
>
> 2. "If that is true, then can you suggest a work-around of combination of
> ... "
>
> Kylin is planning to introduce NRT streaming(coding is completed but not
> released),
> which can make the time-lag to about 3 minutes(that is my estimation but I
> am
> quite certain about it).
> NRT stands for 'near real-time', it will run a job and do micro-batch
> aggregation and persistence periodically. The price is that you need to run
> and monitor a long-running
>  job. This feature is based on Spark Streaming, so you need knowledge of
> it.
>
> I am curious about what is the maximum time-lag your customers
> can tolerate?
> Personally, I guess minute level time-lag is ok for most cases.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy  wrote:
>
> > Druid is better in
> > - Have a real-time datasource like Kafka etc.
> >
> > ==
> >
> > Hi Xiaoxiang, thank you for your response.
> >
> > In this important scenario of realtime alalytics, the reason here is that
> > kylin has lag time due to model update of new segment build, is that
> > correct?
> >
> > If that is true, then can you suggest a work-around of combination of :
> >
> > (time - lag kylin cube) + (realtime DB update) to provide
> > realtime capability ?
> >
> > IMO, the point here is to find that (realtime DB update) and integrate it
> > with (time - lag kylin cube).
> >
> > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
> >
> > > I researched and tested Druid two years ago(I don't know too much about
> > >  the change of Druid in these two years. New features that I know are :
> > > new UI, fully on K8s etc).
> > >
> > > Here are some cases you should consider using Druid other than Kylin
> > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> two
> > > years ago):
> > >
> > > - Have a real-time datasource like Kafka etc.
> > > - Most queries are small(Based on my test result, I think Druid had
> > better
> > > response time for small queries two years ago.)
> > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > >   cloud platform as your deployment platform.
> > >
> > > But I do think there are many scenarios in which Kylin could be better,
> > > like:
> > >
> > > - Better performance for complex/big queries. Kylin can have a more
> > > exact-match/fine-grained
> > >   Index for queries containing different `Group By dimensions`.
> > > - User-friendly UI for modeling.
> > > - Support 'Join' better? (Not sure at the moment)
> > > - ODBC driver for different BI.(its website did not show it supports
> ODBC
> > > well)
> > > - Looks like Kylin supports ANSI SQL better than Druid.
> > >
> > >
> > > I don't know Pinot, so I have nothing to say about it.
> > > Hope to help you, or you are free to share your opinion.
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Dear Xiaoxiang,
> > >> Sirs/Madams,
> > >>
> > >> May I post my boss's question:
> > >>
> > >> What are the pros and cons of the OLAP platform Kylin compared to
> Pinot
> > >> and
> > >> Druid?
> > >>
> > >> Please kindly let me know
> > >>
> > >> Thank you very much and best regards
> > >>
> > >
> >
>


Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Xiaoxiang Yu
1. "In this important scenario of realtime analytics, the reason here is
that
kylin has lag time due to model update of new segment build, is that
correct?"

You are correct.

2. "If that is true, then can you suggest a work-around of combination of
... "

Kylin is planning to introduce NRT streaming(coding is completed but not
released),
which can make the time-lag to about 3 minutes(that is my estimation but I
am
quite certain about it).
NRT stands for 'near real-time', it will run a job and do micro-batch
aggregation and persistence periodically. The price is that you need to run
and monitor a long-running
 job. This feature is based on Spark Streaming, so you need knowledge of it.

I am curious about what is the maximum time-lag your customers
can tolerate?
Personally, I guess minute level time-lag is ok for most cases.


With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy  wrote:

> Druid is better in
> - Have a real-time datasource like Kafka etc.
>
> ==
>
> Hi Xiaoxiang, thank you for your response.
>
> In this important scenario of realtime alalytics, the reason here is that
> kylin has lag time due to model update of new segment build, is that
> correct?
>
> If that is true, then can you suggest a work-around of combination of :
>
> (time - lag kylin cube) + (realtime DB update) to provide
> realtime capability ?
>
> IMO, the point here is to find that (realtime DB update) and integrate it
> with (time - lag kylin cube).
>
> On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
>
> > I researched and tested Druid two years ago(I don't know too much about
> >  the change of Druid in these two years. New features that I know are :
> > new UI, fully on K8s etc).
> >
> > Here are some cases you should consider using Druid other than Kylin
> > at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> > years ago):
> >
> > - Have a real-time datasource like Kafka etc.
> > - Most queries are small(Based on my test result, I think Druid had
> better
> > response time for small queries two years ago.)
> > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> >   cloud platform as your deployment platform.
> >
> > But I do think there are many scenarios in which Kylin could be better,
> > like:
> >
> > - Better performance for complex/big queries. Kylin can have a more
> > exact-match/fine-grained
> >   Index for queries containing different `Group By dimensions`.
> > - User-friendly UI for modeling.
> > - Support 'Join' better? (Not sure at the moment)
> > - ODBC driver for different BI.(its website did not show it supports ODBC
> > well)
> > - Looks like Kylin supports ANSI SQL better than Druid.
> >
> >
> > I don't know Pinot, so I have nothing to say about it.
> > Hope to help you, or you are free to share your opinion.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy 
> wrote:
> >
> >> Dear Xiaoxiang,
> >> Sirs/Madams,
> >>
> >> May I post my boss's question:
> >>
> >> What are the pros and cons of the OLAP platform Kylin compared to Pinot
> >> and
> >> Druid?
> >>
> >> Please kindly let me know
> >>
> >> Thank you very much and best regards
> >>
> >
>


Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Nam Đỗ Duy via user
Druid is better in
- Have a real-time datasource like Kafka etc.

==

Hi Xiaoxiang, thank you for your response.

In this important scenario of realtime alalytics, the reason here is that
kylin has lag time due to model update of new segment build, is that
correct?

If that is true, then can you suggest a work-around of combination of :

(time - lag kylin cube) + (realtime DB update) to provide
realtime capability ?

IMO, the point here is to find that (realtime DB update) and integrate it
with (time - lag kylin cube).

On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:

> I researched and tested Druid two years ago(I don't know too much about
>  the change of Druid in these two years. New features that I know are :
> new UI, fully on K8s etc).
>
> Here are some cases you should consider using Druid other than Kylin
> at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> years ago):
>
> - Have a real-time datasource like Kafka etc.
> - Most queries are small(Based on my test result, I think Druid had better
> response time for small queries two years ago.)
> - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
>   cloud platform as your deployment platform.
>
> But I do think there are many scenarios in which Kylin could be better,
> like:
>
> - Better performance for complex/big queries. Kylin can have a more
> exact-match/fine-grained
>   Index for queries containing different `Group By dimensions`.
> - User-friendly UI for modeling.
> - Support 'Join' better? (Not sure at the moment)
> - ODBC driver for different BI.(its website did not show it supports ODBC
> well)
> - Looks like Kylin supports ANSI SQL better than Druid.
>
>
> I don't know Pinot, so I have nothing to say about it.
> Hope to help you, or you are free to share your opinion.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy  wrote:
>
>> Dear Xiaoxiang,
>> Sirs/Madams,
>>
>> May I post my boss's question:
>>
>> What are the pros and cons of the OLAP platform Kylin compared to Pinot
>> and
>> Druid?
>>
>> Please kindly let me know
>>
>> Thank you very much and best regards
>>
>


Re: Pinot/Kylin/Druid quick comparision

2023-11-30 Thread Xiaoxiang Yu
I researched and tested Druid two years ago(I don't know too much about
 the change of Druid in these two years. New features that I know are : new
UI, fully on K8s etc).

Here are some cases you should consider using Druid other than Kylin
at the moment (using Kylin 5.0-beta to compare the Druid which I used two
years ago):

- Have a real-time datasource like Kafka etc.
- Most queries are small(Based on my test result, I think Druid had better
response time for small queries two years ago.)
- Don't know how to optimize Spark/Hadoop, want to use the K8S/public
  cloud platform as your deployment platform.

But I do think there are many scenarios in which Kylin could be better,
like:

- Better performance for complex/big queries. Kylin can have a more
exact-match/fine-grained
  Index for queries containing different `Group By dimensions`.
- User-friendly UI for modeling.
- Support 'Join' better? (Not sure at the moment)
- ODBC driver for different BI.(its website did not show it supports ODBC
well)
- Looks like Kylin supports ANSI SQL better than Druid.


I don't know Pinot, so I have nothing to say about it.
Hope to help you, or you are free to share your opinion.


With warm regard
Xiaoxiang Yu



On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy  wrote:

> Dear Xiaoxiang,
> Sirs/Madams,
>
> May I post my boss's question:
>
> What are the pros and cons of the OLAP platform Kylin compared to Pinot and
> Druid?
>
> Please kindly let me know
>
> Thank you very much and best regards
>


Pinot/Kylin/Druid quick comparision

2023-11-30 Thread Nam Đỗ Duy via user
Dear Xiaoxiang,
Sirs/Madams,

May I post my boss's question:

What are the pros and cons of the OLAP platform Kylin compared to Pinot and
Druid?

Please kindly let me know

Thank you very much and best regards


Re: How to use measure of Kylin in query

2023-11-22 Thread Xiaoxiang Yu
Your summary
"I write query normally to query the desired column and Kylin uses the
index mechanism to accelerate query"
 is almost right. And I cannot understand exactly what your question is.




With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 5:32 PM Nam Đỗ Duy  wrote:

> Dear Sir / Madam
>
> I've searched the web but cannot find the way to use measures of Kylin, for
> example, with this  quote from the URL of document, it seems that the
> measure's magic is as follows: "I write query normally to query the desired
> colummn and Kylin uses the index mechanism to accelerate query", can you
> please advise?
>
> Count Distinct (Precise) | Welcome to Kylin 5 (apache.org)
> <
> https://kylin.apache.org/5.0/docs/modeling/model_design/measure_design/count_distinct_bitmap
> >
>
> Once the measure is added and the model is saved, you need to go to the
> Edit
> Aggregate Index page, add the corresponding dimensions and measures to the
> appropriate aggregate group according to your business scenario, and the
> new aggregate index will be generated after submission. You need to build
> index and load data to complete the precomputation of the target column.
> You can check the job of Build Index in the Job Monitor page. After the
> index is built, you can use the Count Distinct (Precise) measure to do some
> querying.
>


How to use measure of Kylin in query

2023-11-22 Thread Nam Đỗ Duy via user
Dear Sir / Madam

I've searched the web but cannot find the way to use measures of Kylin, for
example, with this  quote from the URL of document, it seems that the
measure's magic is as follows: "I write query normally to query the desired
colummn and Kylin uses the index mechanism to accelerate query", can you
please advise?

Count Distinct (Precise) | Welcome to Kylin 5 (apache.org)


Once the measure is added and the model is saved, you need to go to the Edit
Aggregate Index page, add the corresponding dimensions and measures to the
appropriate aggregate group according to your business scenario, and the
new aggregate index will be generated after submission. You need to build
index and load data to complete the precomputation of the target column.
You can check the job of Build Index in the Job Monitor page. After the
index is built, you can use the Count Distinct (Precise) measure to do some
querying.


Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu
Yes, you are right.




--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 17:57:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang, 


1. For my question of near real time data: this scenario is not about querying 
the cube (index), I am mentioning the query against the Hive table only: is 
that possible to instantly querying the non_cube data if the data is already in 
Hive?


Best regards


On Mon, Nov 13, 2023 at 4:23 PM Xiaoxiang Yu  wrote:

1.  Query them instantly is not possible, you need to trigger a build job and 
wait it completed, it will cost about 5-30 mintues in most cases. So 
the delay caused by Kylin is 5-30 minites.

2. DS/AI can send SQL query using Python and get the result(if kylinpy works 
well), just like you do in Kylin insight window.










--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 17:09:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang for answering my previous question


1. For previous question 1, if I can ingest data near real-time into Hive 
table, can that near realtime data be queried in Kylin insights windows by SQL 
query almost instantly? If not then how can I reflect near realtime data in 
(Kylin insights Window as well as in PowerBI report which connect to Kylin via 
mez)?


2. For previous question 2, if DS/AI team cannot access Kylin parquet file via 
java/python/scala then can they:


2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API


Thank you very much


On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 


Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.


If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.




```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```










--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,


Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 


a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.


I have two questions:


1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise


Thank you very much








On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:


1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects


Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user
Thank you Xiaoxiang,

1. For my question of near real time data: this scenario is not about
querying the cube (index), I am mentioning the query against the Hive table
only: is that possible to instantly querying the non_cube data if the data
is already in Hive?

Best regards

On Mon, Nov 13, 2023 at 4:23 PM Xiaoxiang Yu  wrote:

> 1.  Query them instantly is not possible, you need to trigger a build job
> and wait it completed, it will cost about 5-30 mintues in most cases. So
> the delay caused by Kylin is 5-30 minites.
>
> 2. DS/AI can send SQL query using Python and get the result(if kylinpy
> works well), just like you do in Kylin insight window.
>
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2023-11-13 17:09:59, "Nam Đỗ Duy via user" 
> wrote:
>
> Thank you Xiaoxiang for answering my previous question
>
> 1. For previous question 1, if I can ingest data near real-time into Hive
> table, can that near realtime data be queried in Kylin insights windows by
> SQL query almost instantly? If not then how can I reflect near
> realtime data in (Kylin insights Window as well as in PowerBI report which
> connect to Kylin via mez)?
>
> 2. For previous question 2, if DS/AI team cannot access Kylin parquet file
> via java/python/scala then can they:
>
> 2.1) access the Hive Star schema table?
> 2.2) access kylin cube via API?
> 2.3) access computed fields of kylin cube via API
> 2.4 access kylin model's  measures via API
>
> Thank you very much
>
> On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:
>
>> Hi,
>> Question 1:
>> You are almost right.
>> If the Cube not ready, Kylin will use SparkSQL to execute query directly
>> on original tables.
>>
>> Question 2:
>> It is possible but very hard.
>> The index data are saved in Parquet format, it is possible to read them
>> by Spark, but the columns' name are encoded
>>  so you don't understand which columns are useful to you. The mapping
>> from parquet files'
>> columns to Model's dimensions or measures is stored Kylin's metastore, so
>> the knowledge of Kylin source code
>> is required to make good use of model/index files when reading them
>> directly.
>>
>> If we have a Python library(like
>> https://github.com/Kyligence/kylinpy/tree/master) which provide
>>  the ability that you can send SQL to Kylin. Will it be helpful to your
>> Data science team?
>> Following is an example.
>>
>>
>> ```
>>  >>> import sqlalchemy as sa
>>  >>> import pandas as pd
>>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
>> /learn_kylin?timeout=60_debug=1')
>>  >>> sql = 'select * from kylin_sales limit 10'
>>  >>> pd.read_sql(sql, kylin_engine)
>>
>> ```
>>
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From :**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" 
>> wrote:
>>
>> Hi Xiaoxiang,
>>
>> Basically you can imagine the scenario that there will be3 teams who will
>> be using Kylin's Cube:
>>
>> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez),
>> superset to access kylin Cube as well.
>> b) Data science team (DS) who is using Pyspark, SparkML currently
>> assessing HDFS and parquet directly as raw file.
>> c) AI team who is using various interfaces like Java, Python, Scala to
>> assess HDFS and parquet directly as raw file.
>>
>> I have two questions:
>>
>> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
>> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
>> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
>> how can Hive/Kylin provide more merits to these teams? For this question, I
>> imagine of OLAP speed or computed metrics etc but I am not sure so please
>> advise
>>
>> Thank you very much
>>
>>
>>
>>
>> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:
>>
>>> Do you have any specific business scenario? Looks like there is
>>> not such real usecase at the moment.
>>>
>>>
>>>
>>> --
>>> *Best wishes to you ! *
>>> *From :**Xiaoxiang Yu*
>>>
>>>
>>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
>>> wrote:
>>>
>>> Dear Sir/Madam
>>>
>>> I am persuading my company to use kylin as olap platform so please
>>> kindly share with me (inbox me if you hesitate to share publicly) your real
>>> use-cases to help me answer our boss’s question:
>>>
>>> 1. Which companies are using kylin now
>>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>>
>>> Thank you very much for your valuable time and support
>>>
>>>


Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu
1.  Query them instantly is not possible, you need to trigger a build job and 
wait it completed, it will cost about 5-30 mintues in most cases. So 
the delay caused by Kylin is 5-30 minites.

2. DS/AI can send SQL query using Python and get the result(if kylinpy works 
well), just like you do in Kylin insight window.










--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 17:09:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang for answering my previous question


1. For previous question 1, if I can ingest data near real-time into Hive 
table, can that near realtime data be queried in Kylin insights windows by SQL 
query almost instantly? If not then how can I reflect near realtime data in 
(Kylin insights Window as well as in PowerBI report which connect to Kylin via 
mez)?


2. For previous question 2, if DS/AI team cannot access Kylin parquet file via 
java/python/scala then can they:


2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API


Thank you very much


On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 


Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.


If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.




```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```










--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,


Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 


a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.


I have two questions:


1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise


Thank you very much








On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:


1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects


Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user
Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive
table, can that near realtime data be queried in Kylin insights windows by
SQL query almost instantly? If not then how can I reflect near
realtime data in (Kylin insights Window as well as in PowerBI report which
connect to Kylin via mez)?

2. For previous question 2, if DS/AI team cannot access Kylin parquet file
via java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

> Hi,
> Question 1:
> You are almost right.
> If the Cube not ready, Kylin will use SparkSQL to execute query directly
> on original tables.
>
> Question 2:
> It is possible but very hard.
> The index data are saved in Parquet format, it is possible to read them by
> Spark, but the columns' name are encoded
>  so you don't understand which columns are useful to you. The mapping from
> parquet files'
> columns to Model's dimensions or measures is stored Kylin's metastore, so
> the knowledge of Kylin source code
> is required to make good use of model/index files when reading them
> directly.
>
> If we have a Python library(like
> https://github.com/Kyligence/kylinpy/tree/master) which provide
>  the ability that you can send SQL to Kylin. Will it be helpful to your
> Data science team?
> Following is an example.
>
>
> ```
>  >>> import sqlalchemy as sa
>  >>> import pandas as pd
>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
> /learn_kylin?timeout=60_debug=1')
>  >>> sql = 'select * from kylin_sales limit 10'
>  >>> pd.read_sql(sql, kylin_engine)
>
> ```
>
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" 
> wrote:
>
> Hi Xiaoxiang,
>
> Basically you can imagine the scenario that there will be3 teams who will
> be using Kylin's Cube:
>
> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
> to access kylin Cube as well.
> b) Data science team (DS) who is using Pyspark, SparkML currently
> assessing HDFS and parquet directly as raw file.
> c) AI team who is using various interfaces like Java, Python, Scala to
> assess HDFS and parquet directly as raw file.
>
> I have two questions:
>
> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
> how can Hive/Kylin provide more merits to these teams? For this question, I
> imagine of OLAP speed or computed metrics etc but I am not sure so please
> advise
>
> Thank you very much
>
>
>
>
> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:
>
>> Do you have any specific business scenario? Looks like there is
>> not such real usecase at the moment.
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From :**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
>> wrote:
>>
>> Dear Sir/Madam
>>
>> I am persuading my company to use kylin as olap platform so please kindly
>> share with me (inbox me if you hesitate to share publicly) your real
>> use-cases to help me answer our boss’s question:
>>
>> 1. Which companies are using kylin now
>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>
>> Thank you very much for your valuable time and support
>>
>>


Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu
Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 


Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.


If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.




```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```










--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,


Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 


a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.


I have two questions:


1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise


Thank you very much








On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:


1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects


Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user
Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will
be using Kylin's Cube:

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
to access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to
assess HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not
ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how
can Hive/Kylin provide more merits to these teams? For this question, I
imagine of OLAP speed or computed metrics etc but I am not sure so please
advise

Thank you very much




On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

> Do you have any specific business scenario? Looks like there is
> not such real usecase at the moment.
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
> wrote:
>
> Dear Sir/Madam
>
> I am persuading my company to use kylin as olap platform so please kindly
> share with me (inbox me if you hesitate to share publicly) your real
> use-cases to help me answer our boss’s question:
>
> 1. Which companies are using kylin now
> 2. How do you use kylin’s capabilities in your AI/ML projects
>
> Thank you very much for your valuable time and support
>
>


Re:Kylin real usecase on AI/ML (data science) project

2023-11-12 Thread Xiaoxiang Yu
I'm just curious. 
How much does MDX and AI/ML influenced your company's decision to use Kylin 5? 
Is it a nice-to-have requirement or a must-have requirement?


Thanks.

--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 15:38:17, "Xiaoxiang Yu"  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:


1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects


Thank you very much for your valuable time and support

Re:Has anyone used MDX in Kylin 5.0 ?

2023-11-12 Thread Xiaoxiang Yu
Currently I think there is no such use case.




--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:33:13, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I found document on MDX on Kylin 4.0 but still not found MDX on document of 5.0 


Has anyone been using MDX on Kylin 5.0 and kindly provide me some feed back on 
this topic


Thank you very much and best regards

Re:Kylin real usecase on AI/ML (data science) project

2023-11-12 Thread Xiaoxiang Yu
Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam


I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:


1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects


Thank you very much for your valuable time and support

Kylin real usecase on AI/ML (data science) project

2023-11-12 Thread Nam Đỗ Duy via user
Dear Sir/Madam

I am persuading my company to use kylin as olap platform so please kindly
share with me (inbox me if you hesitate to share publicly) your real
use-cases to help me answer our boss’s question:

1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects

Thank you very much for your valuable time and support


Has anyone used MDX in Kylin 5.0 ?

2023-11-12 Thread Nam Đỗ Duy via user
Dear Sir/Madam

I found document on MDX on Kylin 4.0 but still not found MDX on document of
5.0

Has anyone been using MDX on Kylin 5.0 and kindly provide me some feed
back on this topic

Thank you very much and best regards


Integrate Kylin with Power BI report server

2023-11-06 Thread Nam Đỗ Duy via user
Dear Sir/Madam

We are using PowerBI report server on premise. Please kindly advise me the
integration guidelines in two scenarios:

1. POC period:
Power BI report server: Windows 10
Kylin 5.0 in docker file downloaded from official kylin website

2. Development period:
Power BI report server: Windows 10
We plan to not use docker this time: Kylin 5.0, Hive, zoo keeper, hadoop in
Ubuntu

Thank you very much and best regards


Re: 如何设置树形层级结构维度

2023-10-25 Thread ShaoFeng Shi
It is not supported I think.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




雨后初晴 <745579...@qq.com> 于2023年10月25日周三 00:49写道:

>
> 有个“组织”维度,其层级是不定的。结构类似于:orgId,parentOrgId,orgName,isLeaf。分别表示组织id、上级组织id、组织名称、是否末级组织,一个组织只有一个直接上级组织。
> 事实表中只有末级组织id的数据,但查询需要任意一层级组织的数据。kylin如何定义设置这类维度的?
>


回复:如何设置树形层级结构维度

2023-10-24 Thread 吴志民
hi 雨后初晴


现在应该是有一张以orgId 为主键的事实表,和一张包含组织关系的维度表。我想到的一个办法是可以通过 etl 的手段将维度表中的树形关系展开,增加一个 
leafOrgId 字段,维度表以 orgID 和 leafOrgId 作为联合主键内容包含节点到其所有叶子节点的数据。在 Kylin 将事实表的 orgID 
与维度表的 leafOrgId 进行关联,查询以维度表中的 orgId 字段作为聚合物即可。



---原始邮件---
发件人: "雨后初晴"<745579...@qq.com
发送时间: 2023年10月25日(周三) 凌晨0:49
收件人: "user"

????????????????????????

2023-10-24 Thread ????????
??orgId,parentOrgId,orgName,isLeaf??id??id
idkylin

Re: 退订

2023-10-12 Thread lee
退订

> 2023年10月11日 13:07,许颖众  写道:
> 
>  
>  
> 发件人: hit_la...@126.com   > 代表 Xiaoxiang Yu
> 发送时间: 2023年8月2日 17:39
> 收件人: user@kylin.apache.org ; leil...@zqykj.com 
> ; 1163629...@qq.com 
> 主题: Re:退订
>  
> Hey, to unsubscribe user mailling list, please follow these steps:
>  
> First, you need to send any text to user-unsubscr...@kylin.apache.org 
>  . 
> And you will receive a confirm email with title "confirm unsubscribe from 
> user@kylin.apache.org  " in several minutes.
>  
> Then, you have to reply (with any text) to that confirm email to confirm your 
> unsubscribe request. 
> You will receiver a final email with title "GOODBYE from 
> user@kylin.apache.org ", that is to say you 
> finally unsubscribe successfully.
>  
> The offical guide : 
> https://www.apache.org/foundation/mailinglists.html#request-confirmation 
>  .
>  
> --
> Best wishes to you ! 
> From :Xiaoxiang Yu
>  
> 在 2023-08-02 16:53:32,"雷利娜" mailto:leil...@zqykj.com>> 写道:
> 
> 退订
> 你好,由于工作原因退订所有相关的邮件。
> 谢谢!
> 免责声明:本邮件所包含信息发给指定个人或机构,邮件可能包含保密或专属信息。未经接收者许可,不得阅读、转发或传播邮件内容,或根据邮件内容采取任何相关行动。如果错误地收到了此邮件,请与收件人联系并自行删除邮件内容。
> 
> Disclaimer:The information transmitted is intended only for the person or 
> entity to which it is addressed and may contain confidential and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error , please contact the sender and delete the material from any 
> computer .
> 



退订

2023-10-10 Thread 许颖众


发件人: hit_la...@126.com  代表 Xiaoxiang Yu
发送时间: 2023年8月2日 17:39
收件人: user@kylin.apache.org; leil...@zqykj.com; 1163629...@qq.com
主题: Re:退订

Hey, to unsubscribe user mailling list, please follow these steps:

First, you need to send any text to 
user-unsubscr...@kylin.apache.org .
And you will receive a confirm email with title "confirm unsubscribe from 
user@kylin.apache.org " in several minutes.

Then, you have to reply (with any text) to that confirm email to confirm your 
unsubscribe request.
You will receiver a final email with title "GOODBYE from 
user@kylin.apache.org", that is to say you 
finally unsubscribe successfully.

The offical guide : 
https://www.apache.org/foundation/mailinglists.html#request-confirmation .



--
Best wishes to you !
From :Xiaoxiang Yu



在 2023-08-02 16:53:32,"雷利娜" mailto:leil...@zqykj.com>> 写道:
退订
你好,由于工作原因退订所有相关的邮件。
谢谢!

免责声明:本邮件所包含信息发给指定个人或机构,邮件可能包含保密或专属信息。未经接收者许可,不得阅读、转发或传播邮件内容,或根据邮件内容采取任何相关行动。如果错误地收到了此邮件,请与收件人联系并自行删除邮件内容。

Disclaimer:The information transmitted is intended only for the person or 
entity to which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or taking 
of any action in reliance upon, this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error , 
please contact the sender and delete the material from any computer .


Docker Image for Kylin 5.0-beta is available now

2023-09-08 Thread Xiaoxiang Yu
Following is the command to preview Kylin 5 really quickly.:


   docker run -d \
  --name Kylin5-Machine \
  --hostname Kylin5-Machine \
  -m 8G \
  -p 7070:7070 \
  -p 8088:8088 \
  -p 9870:9870 \
  -p 8032:8032 \
  -p 8042:8042 \
  -p 2181:2181 \
  apachekylin/apache-kylin-standalone:5.0-beta


docker logs --follow Kylin5-Machine




Please visit https://hub.docker.com/r/apachekylin/apache-kylin-standalone for 
more information. 



--

Best wishes to you ! 
From :Xiaoxiang Yu

Re: Kylin defaut storage system is HDFS?

2023-09-02 Thread marc nicole
Hi Yu,

This link *https://kylin.apache.org/docs31/tutorial/setup_jdbc_datasource.html
*
suggests that "*Since v2.3.0 Apache Kylin starts to support JDBC as the
third type of data source (after Hive, Kafka)"*

So My question of if I can have MySQL as an  alternative to Hive has
positive answer according to the link above, or am I wrong?

Le lun. 28 août 2023 à 04:48, Xiaoxiang Yu  a écrit :

> Hi,
> For Kylin 5, you have to use a distributed storage, and the default
> choice is HDFS,
> and alternative choice is Cloud Storage(like S3), you can NOT deploy and
> run Kylin
> without a distributed storage.
> Besides, you need a RDBMS as a metastore, Zookeeper as service
> discovery,
> a Spark cluster as compute service, a Hive Metastore for seeking
> databases and tables.
> Finally, HBae is totally NOT necessary for Kylin 4.0 or higher.
>
> For the question 'Could I use Kylin with just MySQL + Sqoop? (no Hive)
> ', the
> answer is no, you need to install and deployed Zookeeper, a distributed
> storage
> (HDFS or cloud storage), a Spark cluster and a Hive metastore. Here is a
> diagram
> maybe helpful:
> https://kylin.apache.org/images/blog/kylin4_on_cloud/3_kylin_cluster.jpg
>
> Here are some links:
> - https://kylin.apache.org/blog/2022/04/20/kylin4-on-cloud-part1/
> -
> https://kylin.apache.org/5.0/docs/deployment/on-premises/installation/platform/install_on_apache_hadoop
>
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Aug 26, 2023 at 8:03 PM marc nicole  wrote:
>
>> Hello,
>>
>> I have few questions regarding storage mean for Kylin:
>>
>> I was wondering if Kylin would work normally if I don't configure it to
>> work with any storage tool (as MySQL with Sqoop Or with Hive)? It would
>> then automatically use HDFS ?
>>
>> Also is configuring HBASE necessary?
>>
>> Could I use Kylin with just MySQL + Sqoop? (no Hive)
>> What the use of HBase if the normal used storage is Hive?
>>
>> Thanks. Regards
>>
>


[Announce] Apache Kylin 5.0.0-beta released

2023-08-30 Thread Xiaoxiang Yu
The Apache Kylin team is pleased to announce the immediate availability of
the 5.0.0-beta release.

This is the second release for Kylin 5, with 62 new features/improvements
and 124 bug fixes.

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/5.0/docs/download

Apache Kylin is an open-source Distributed Analytical Data Warehouse for
Big Data; it was designed to provide OLAP (Online Analytical Processing)
capability in the big data era. By renovating the multi-dimensional cube
and precalculation technology on Hadoop and Spark, Kylin is able to achieve
near-constant query speed regardless of the ever-growing data volume.
Reducing query latency from minutes to sub-second, Kylin brings online
analytics back to big data.

Apache Kylin lets you query billions of rows at sub-second latency in 3
steps:
1. Identify a Star/Snowflake Schema on Hadoop.
2. Build Model from the identified tables.
3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC or
RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/


With warm regard
Xiaoxiang Yu


退订

2023-08-29 Thread 雷利娜
退订
  回复的原邮件  
 发件人 李甜彪   
 发送日期  2023年7月31日 17:52 
 收件人  user@kylin.apache.org  
 
 主题  kylin使用过程中出现的问题 
在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。
 
李甜彪
ltb1...@163.com
 
 


退订

2023-08-29 Thread Lee
退订


 回复的原邮件 
| 发件人 | 李甜彪 |
| 发送日期 | 2023年7月31日 17:52 |
| 收件人 | user@kylin.apache.org |
| 主题 | kylin使用过程中出现的问题 |
在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。



| |
李甜彪
|
|
ltb1...@163.com
|

Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen
Hello! Registration is still open for the upcoming Community Over Code
NA event in Halifax, NS! We invite you to  register for the event
https://communityovercode.org/registration/

Apache Committers, note that you have a special discounted rate for the
conference at US$250. To take advantage of this rate, use the special
code sent to the committers@ list by Brian Proffitt earlier this month.

If you are in need of an invitation letter, please consult the
information at https://communityovercode.org/visa-letter/

Please see https://communityovercode.org/ for more information about
the event, including how to make reservations for discounted hotel
rooms in Halifax. Discounted rates will only be available until Sept.
5, so reserve soon!

--Rich, for the event planning team


Re: Kylin defaut storage system is HDFS?

2023-08-27 Thread Xiaoxiang Yu
Hi,
For Kylin 5, you have to use a distributed storage, and the default
choice is HDFS,
and alternative choice is Cloud Storage(like S3), you can NOT deploy and
run Kylin
without a distributed storage.
Besides, you need a RDBMS as a metastore, Zookeeper as service
discovery,
a Spark cluster as compute service, a Hive Metastore for seeking
databases and tables.
Finally, HBae is totally NOT necessary for Kylin 4.0 or higher.

For the question 'Could I use Kylin with just MySQL + Sqoop? (no Hive)
', the
answer is no, you need to install and deployed Zookeeper, a distributed
storage
(HDFS or cloud storage), a Spark cluster and a Hive metastore. Here is a
diagram
maybe helpful:
https://kylin.apache.org/images/blog/kylin4_on_cloud/3_kylin_cluster.jpg

Here are some links:
- https://kylin.apache.org/blog/2022/04/20/kylin4-on-cloud-part1/
-
https://kylin.apache.org/5.0/docs/deployment/on-premises/installation/platform/install_on_apache_hadoop



With warm regard
Xiaoxiang Yu



On Sat, Aug 26, 2023 at 8:03 PM marc nicole  wrote:

> Hello,
>
> I have few questions regarding storage mean for Kylin:
>
> I was wondering if Kylin would work normally if I don't configure it to
> work with any storage tool (as MySQL with Sqoop Or with Hive)? It would
> then automatically use HDFS ?
>
> Also is configuring HBASE necessary?
>
> Could I use Kylin with just MySQL + Sqoop? (no Hive)
> What the use of HBase if the normal used storage is Hive?
>
> Thanks. Regards
>


Kylin defaut storage system is HDFS?

2023-08-26 Thread marc nicole
Hello,

I have few questions regarding storage mean for Kylin:

I was wondering if Kylin would work normally if I don't configure it to
work with any storage tool (as MySQL with Sqoop Or with Hive)? It would
then automatically use HDFS ?

Also is configuring HBASE necessary?

Could I use Kylin with just MySQL + Sqoop? (no Hive)
What the use of HBase if the normal used storage is Hive?

Thanks. Regards


Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread P.F. ZHAN
Do you mean extract data from the fact table? Define a  model based on this
table, and dimensions & this measure, then create an aggregation group with
these dimensions and measure, build to get the data you preferred. Or do
you mean the metadata file? Just query the database or dump metadata to
check what you need.

On Wed, Aug 2, 2023 at 20:22 marc nicole  wrote:

> the dataset (csv) is in this form
>
> att1
>
> att2
>
> count_measure
>  str1
> str2
> int1
> ...
> How to extract from it fact table (the count_measure column) and the
> dimensions ? in the datasource / model definition in Kylin?
>
> Le mer. 2 août 2023 à 13:32, marc nicole  a écrit :
>
>> in the model the measure definition is as follows (I don't understand
>> chinese)
>>
>> {
>>   "name": "COUNT1",
>>   "function": {
>> "expression": "COUNT",
>> "parameter": {
>>   "type": "column",
>>   "value": "attributeName.COUNT1"
>> },
>> "returntype": "bigint"
>>   }
>> },
>>
>>
>> Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :
>>
>>> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
>>> expected answer?
>>> If you have checked, I think you can give us you cube
>>> defination(CubeDesc in Json) and the SQL statement you queried, so we can
>>> discuss in detail?
>>>
>>>
>>>
>>> --
>>> *Best wishes to you ! *
>>> *From :**Xiaoxiang Yu*
>>>
>>>
>>> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>>>
>>> The measure is of type column (not constant) and is bigint. I selected
>>> the measure from the dropdown corretly as well. measure column returns 1
>>> for all column values instead of actual values when querying the cube. What
>>> could be the underlying problem? cube or model defining? or maybe data
>>> source attribute types?
>>>
>>> Maybe I should create a lookup table with the fact table (which I am not
>>> doing so far)?
>>>
>>> Why the measure column in query answer is showing only 1 as values ??
>>>
>>>
>>>
>>>


?????? measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread ????????
??




----
??: "marc nicole"

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread marc nicole
the dataset (csv) is in this form

att1

att2

count_measure
 str1
str2
int1
...
How to extract from it fact table (the count_measure column) and the
dimensions ? in the datasource / model definition in Kylin?

Le mer. 2 août 2023 à 13:32, marc nicole  a écrit :

> in the model the measure definition is as follows (I don't understand
> chinese)
>
> {
>   "name": "COUNT1",
>   "function": {
> "expression": "COUNT",
> "parameter": {
>   "type": "column",
>   "value": "attributeName.COUNT1"
> },
> "returntype": "bigint"
>   }
> },
>
>
> Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :
>
>> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
>> expected answer?
>> If you have checked, I think you can give us you cube defination(CubeDesc
>> in Json) and the SQL statement you queried, so we can discuss in detail?
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From :**Xiaoxiang Yu*
>>
>>
>> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>>
>> The measure is of type column (not constant) and is bigint. I selected
>> the measure from the dropdown corretly as well. measure column returns 1
>> for all column values instead of actual values when querying the cube. What
>> could be the underlying problem? cube or model defining? or maybe data
>> source attribute types?
>>
>> Maybe I should create a lookup table with the fact table (which I am not
>> doing so far)?
>>
>> Why the measure column in query answer is showing only 1 as values ??
>>
>>
>>
>>


Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread marc nicole
in the model the measure definition is as follows (I don't understand
chinese)

{
  "name": "COUNT1",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "column",
  "value": "attributeName.COUNT1"
},
"returntype": "bigint"
  }
},


Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :

> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
> expected answer?
> If you have checked, I think you can give us you cube defination(CubeDesc
> in Json) and the SQL statement you queried, so we can discuss in detail?
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>
> The measure is of type column (not constant) and is bigint. I selected the
> measure from the dropdown corretly as well. measure column returns 1 for
> all column values instead of actual values when querying the cube. What
> could be the underlying problem? cube or model defining? or maybe data
> source attribute types?
>
> Maybe I should create a lookup table with the fact table (which I am not
> doing so far)?
>
> Why the measure column in query answer is showing only 1 as values ??
>
>
>
>


Re:退订

2023-08-02 Thread Xiaoxiang Yu
Hey, to unsubscribe user mailling list, please follow these steps:


First, you need to send any text to user-unsubscr...@kylin.apache.org . 
And you will receive a confirm email with title "confirm unsubscribe from 
user@kylin.apache.org " in several minutes.


Then, you have to reply (with any text) to that confirm email to confirm your 
unsubscribe request. 
You will receiver a final email with title "GOODBYE from 
user@kylin.apache.org", that is to say you finally unsubscribe successfully.


The offical guide : 
https://www.apache.org/foundation/mailinglists.html#request-confirmation .




--

Best wishes to you ! 
From :Xiaoxiang Yu




在 2023-08-02 16:53:32,"雷利娜"  写道:

退订

你好,由于工作原因退订所有相关的邮件。
谢谢!

退订

2023-08-02 Thread 雷利娜
退订
你好,由于工作原因退订所有相关的邮件。
谢谢!


?????? measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread ????????





----
??: "??"

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread Xiaoxiang Yu
I guessed Shaofeng has answered your question in wechat group. Following is his 
answer:




把维度表中的字段,做成“normal”维度,这样这些字段的数据,就会被持久化到cube中;构建新的segment的时候,也不会改变过去构建的结果了
如果不是“normal”维度,就会存在于维度表的快照中,所以维度表快照被刷新后数据也就变了


请问你为什么不考虑使用一般维度呢(因为设置为一般维度后,就不会生成维度表快照了)?







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-08-02 15:45:57, "李甜彪"  wrote:

问题就是,维度数据每构建一次就需要变动一次,导致了前面构建的也用了最新的维度数据,但是构建时间跨天就不会有这种问题,还是维度快照的更新机制问题,怎么能解决维度数据变了,前面已经构建过的关联的维表还是当时的那种状态呢?


| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | P.F. ZHAN |
| Date | 8/2/2023 15:34 |
| To |  |
| Subject | Re: measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
这种你不是应该将需要查询的列设置成维度,然后预计算存储到cube么?cube的维度数据,如果不做刷新,那么就不会变化。




On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

大神,能帮我解决一下我碰到的问题吗?维度表的快照怎么能不让取最新的,在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??








Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread 李甜彪
问题就是,维度数据每构建一次就需要变动一次,导致了前面构建的也用了最新的维度数据,但是构建时间跨天就不会有这种问题,还是维度快照的更新机制问题,怎么能解决维度数据变了,前面已经构建过的关联的维表还是当时的那种状态呢?


| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | P.F. ZHAN |
| Date | 8/2/2023 15:34 |
| To |  |
| Subject | Re: measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
这种你不是应该将需要查询的列设置成维度,然后预计算存储到cube么?cube的维度数据,如果不做刷新,那么就不会变化。




On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

大神,能帮我解决一下我碰到的问题吗?维度表的快照怎么能不让取最新的,在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??








Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread P.F. ZHAN
这种你不是应该将需要查询的列设置成维度,然后预计算存储到cube么?cube的维度数据,如果不做刷新,那么就不会变化。


On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

> 大神,能帮我解决一下我碰到的问题吗?维度表的快照怎么能不让取最新的,
> 在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。
> 李甜彪
> ltb1...@163.com
>
> 
>  Replied Message 
> From Xiaoxiang Yu 
> Date 8/2/2023 11:31
> To  ,
>   
> Subject Re:measure column showing 1 as values instead of the actual
> values in Kylin SQL Query answer table
> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
> expected answer?
> If you have checked, I think you can give us you cube defination(CubeDesc
> in Json) and the SQL statement you queried, so we can discuss in detail?
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>
> The measure is of type column (not constant) and is bigint. I selected the
> measure from the dropdown corretly as well. measure column returns 1 for
> all column values instead of actual values when querying the cube. What
> could be the underlying problem? cube or model defining? or maybe data
> source attribute types?
>
> Maybe I should create a lookup table with the fact table (which I am not
> doing so far)?
>
> Why the measure column in query answer is showing only 1 as values ??
>
>
>
>


Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-01 Thread 李甜彪
大神,能帮我解决一下我碰到的问题吗?维度表的快照怎么能不让取最新的,在kylin4.0.3的使用过程中,在同一天构建不同日期分区数据时,维度表需要切换,但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表,现在的想法是每次构建关联维度表的数据在变,但是也要让当天已经构建过的数据不发生变化,有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??








Re:measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-01 Thread Xiaoxiang Yu
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??








measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-01 Thread marc nicole
The measure is of type column (not constant) and is bigint. I selected the
measure from the dropdown corretly as well. measure column returns 1 for
all column values instead of actual values when querying the cube. What
could be the underlying problem? cube or model defining? or maybe data
source attribute types?

Maybe I should create a lookup table with the fact table (which I am not
doing so far)?

Why the measure column in query answer is showing only 1 as values ??


  1   2   3   4   5   6   7   8   9   10   >