subject:"Re\:"

Re:来自suoli的邮件

2024-04-10 Thread suoli



















在 2024-04-11 13:50:20，"suoli"  写道：

Re: How to query the Cube via API and use the dataset for other purpose

2024-04-03 Thread Nam Đỗ Duy via user

Thank you very much for your response, I did ask a pro for help and below
was the sample code on sample SSB project I would like to contribute to
help someone who have same issue like me:

==


import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
import org.json4s.jackson.JsonMethods
import org.json4s.{DefaultFormats, Formats}

import java.io.{BufferedReader, DataOutputStream, InputStreamReader}
import java.net.{HttpURLConnection, URL}
import java.util.Base64

object APIKylinRunSQL {

  val KYLIN_QUERY_URL = "http://localhost:7070/kylin/api/query;
  val USER_NAME = "x"
  val PASSWORD = "y"
  val KYLIN_PROJECT = "learn_kylin"

  val spark = SparkSession.builder
.master("local")
.appName("Convert JSON to DataFrame")
.getOrCreate()

  def main(args: Array[String]): Unit = {


val tablesAndQueries = Map(
  "CUSTOMER" -> "select * from SSB.CUSTOMER",
  "DATES" -> "SELECT * FROM SSB.DATES",
  "PART" -> "SELECT * FROM SSB.PART",
  "P_LINEORDER" -> "SELECT * FROM SSB.P_LINEORDER",
  "SUPPLIER" -> "SELECT * FROM SSB.SUPPLIER",
  "P_LINEORDER" -> "SELECT lo_orderdate, count(1) FROM SSB.P_LINEORDER
GROUP BY lo_orderdate",
  "PART" -> "SELECT P_COLOR, count(1) FROM SSB.PART group by P_COLOR"
)

// query times
val numberOfExecutions = 15

// loop query
for (i <- 1 to numberOfExecutions) {
  println(s"Executing query $i")
  for ((table, query) <- tablesAndQueries) {
println(s"Executing queries for table $table")

println(query)

executeQuery(query)
// wait a seconds
Thread.sleep(1000)
  }
}

  }

  def executeQuery(sqlQuery: String): Unit = {

val queryJson =
  s"""
 |{
 |  "project": "$KYLIN_PROJECT",
 |  "sql": "$sqlQuery"
 |}
 |""".stripMargin

// Encode the username and password for basic authentication
val encodedAuth =
Base64.getEncoder.encodeToString(s"$USER_NAME:$PASSWORD".getBytes)

val url = new URL(KYLIN_QUERY_URL)
val connection = url.openConnection.asInstanceOf[HttpURLConnection]

connection.setRequestMethod("POST")
connection.setRequestProperty("Authorization", s"Basic $encodedAuth")
connection.setRequestProperty("Content-Type", "application/json")
connection.setRequestProperty("Accept", "application/json")
connection.setDoOutput(true)

val outputStream = connection.getOutputStream
val writer = new DataOutputStream(outputStream)
writer.write(queryJson.getBytes("UTF-8"))
writer.flush()
writer.close()

val responseCode = connection.getResponseCode

if (responseCode == HttpURLConnection.HTTP_OK) {
  val inputStream = connection.getInputStream
  val reader = new BufferedReader(new InputStreamReader(inputStream))
  var inputLine: String = null
  val response = new StringBuilder

  while ( {
inputLine = reader.readLine;
inputLine != null
  }) {
response.append(inputLine)
  }
  reader.close()
  println("Result:")
  println(response.toString)

  connection.disconnect()

  // parse JSON
  implicit val formats: Formats = DefaultFormats
  val parsedJson = JsonMethods.parse(response.toString)

  val columns = (parsedJson \ "columnMetas")
.extract[List[Map[String, Any]]]

  // dynamically build the schema based on column name information in
JSON
  val schema = StructType(columns.map { col =>
val columnName = col("name").asInstanceOf[String]
StructField(columnName, StringType, nullable = true)
  })

  schema.printTreeString()

  // get data from JSON
  val data = (parsedJson \ "results").extract[List[List[Any]]]

  // convert data to RDD[Row]
  val rowsRDD = spark.sparkContext.parallelize(data.map(row =>
Row.fromSeq(row.map(_.asInstanceOf[AnyRef]

  val df = spark.createDataFrame(rowsRDD, schema)

  df.show(20, false)

} else {
  println(s"Error: $responseCode")
  connection.disconnect()
}
  }
}


On Sun, Mar 31, 2024 at 8:57 PM Lionel CL  wrote:

> Hi Nam,
> You can refer to the spark docs
> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>
> Regards,
> Lu Cao
>
> From: Nam Đỗ Duy 
> Date: Sunday, March 31, 2024 at 08:53
> To: dev , user@kylin.apache.org <
> user@kylin.apache.org>
> Subject: Re: How to query the

Re: How to query the Cube via API and use the dataset for other purpose

2024-03-30 Thread Nam Đỗ Duy via user

Dear Sirs/Madames

Could anyone here help me to figureout the way to use scala to query an
select SQL against kylin cube via API then turn that table result into a
dataframe in scala for other purpose?

Thank you so much for your time!

Best regards

On Fri, 29 Mar 2024 at 17:52 Nam Đỗ Duy  wrote:

> Hi Xiaoxiang,
> Sir & Madames,
>
> I use the following code to query the cube via API but I cannot use the
> result as a dataframe, could you suggest a way to do that because it is
> very important for our project.
>
> Thanks and best regards
>
> ===
>
> import org.apache.spark.sql.{DataFrame, SparkSession}
> import org.apache.spark.sql.functions._
>
> object APICaller {
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder()
>   .appName("APICaller")
>   .master("local[*]")
>   .getOrCreate()
>
> import spark.implicits._
>
> val username = "namdd"
> val password = "eer123"
> val urlString = "http://localhost:7070/kylin/api/query;
> val project = "learn_kylin"
> val query = "select count(*) from HIVE_DWH_STANDARD.factuserEvent"
>
> val response: String = callAPI(urlString, username, password, project,
> query)
>
> // Convert response to DataFrame
> val df = spark.read.json(Seq(response).toDS())
>
> // Show DataFrame
> df.show()
>
> // Stop Spark session
> spark.stop()
>   }
>
>   def callAPI(url: String, username: String, password: String, project:
> String, query: String): String = {
> val encodedAuth =
> java.util.Base64.getEncoder.encodeToString(s"$username:$password".getBytes)
>
> val connection = scalaj.http.Http(url)
>   .postData(s"""{"project": "$project", "sql": "$query"}""")
>   .header("Content-Type", "application/json")
>   .header("Accept", "application/json")
>   .auth(username, password)
>   .asString
>
> if (connection.isError)
>   throw new RuntimeException(s"Error calling API: ${connection.body}")
>
> connection.body
>   }
> }
>
>

Re: Pinot/Kylin/Druid quick comparision

2024-03-17 Thread Nam Đỗ Duy via user

Thank you Li Yang, I think the development of version 5 would be hard
work for you but the impact is big so please keep me posted!

All the best

On Thu, Mar 14, 2024 at 10:51 AM Li Yang  wrote:

> Nam,
>
> We are planning to release a kylin5-beta around March or April. The GA of
> kylin5 would be around July this year if everything goes well.
>
> Cheers
> Yang
>
> On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy  wrote:
>
>> Hello Xiaoxiang,
>>
>> How are you, my boss is very interested in Kylin 5. so he would like to
>> know when Kylin 5 will be released...could you please provide an
>> estimation?
>>
>> Thank you very much and best regards
>>
>>
>>
>>
>>
>> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:
>>
>> > Good morning Xiaoxiang, hope you are well
>> >
>> > 1. JDBC source is a feature which in development, it will be supported
>> > later.
>> >
>> > ===
>> >
>> > May I know when will the JDBC be available? as well as is there any
>> change
>> > in Kylin 5 release date
>> >
>> > Thank you and best regards
>> >
>> >
>> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
>> >
>> >> 1. JDBC source is a feature which in development, it will be supported
>> >> later.
>> >>
>> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> >> (I will let you know.)
>> >>
>> >> 3. I think ranger and Kerberos are not doing the same things, one for
>> >> authentication, one for authorization. So they cannot replace each
>> other.
>> >> Ranger can integrate with Kerberos, please check ranger's website for
>> >> information.
>> >>
>> >> 
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy 
>> wrote:
>> >>
>> >> > Thank you Xiaoxiang for your reply
>> >> >
>> >> > -
>> >> > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > -
>> >> > Yes: please answer to help me clear this headache:
>> >> >
>> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse
>> ?
>> >> If
>> >> > not then do we have any work around?
>> >> >
>> >> > 2. My team is using kerberos for authentication, do you have any
>> >> > document/casestudy about integrating kerberos with kylin 4.x and
>> kylin
>> >> 5.x
>> >> >
>> >> > 3. Should we use apache ranger instead of kerberos for authentication
>> >> and
>> >> > for security purposes?
>> >> >
>> >> > Thank you again
>> >> >
>> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>> >> >
>> >> > > I guess the release date should be 2024/01 .
>> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > >
>> >> > > 
>> >> > > With warm regard
>> >> > > Xiaoxiang Yu
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
>> >> > wrote:
>> >> > >
>> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> >> > already
>> >> > >> so there is no time for you to comment. Next time I will send you
>> in
>> >> > >> advance. The meeting result was that we will implement both druid
>> and
>> >> > >> kylin
>> >> > >> in the next couple of projects because of its realtime feature.
>> Hope
>> >> > that
>> >> > >> kylin will have same feature soon.
>> >> > >>
>> >> > >> May I ask when will you release kylin 5.0?
>> >> > >>
>> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu 
>> wrote:
>> >> > >>
>> >> > >> > Since 2018 there are a lot of new features and code refactor.
>> >> > >> > If you like, you can share your ppt to me privately, maybe I can
>> >> > >> > give some comments.
>> >> > >> >
>> >> > >> > Here is the reference of advantages of Kylin since 2018:
>> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> >> > >> > -
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >> > >> >
>> >> > >> > 
>> >> > >> > With warm regard
>> >> > >> > Xiaoxiang Yu
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy
>> 
>> >> > >> wrote:
>> >> > >> >
>> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
>> and
>> >> > >> Druid in
>> >> > >> >> my team.
>> >> > >> >>
>> >> > >> >> I found this article and would like you to update me the
>> >> advantages
>> >> > of
>> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> >> released)
>> >> > >> >>
>> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1
>> of
>> >> 2)?
>> >> > >> >> <
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy 
>> wrote:
>> >> > >> >>
>> >> > >> >> > Thank you very much for your prompt

Re: Pinot/Kylin/Druid quick comparision

2024-03-13 Thread Li Yang

Nam,

We are planning to release a kylin5-beta around March or April. The GA of
kylin5 would be around July this year if everything goes well.

Cheers
Yang

On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy  wrote:

> Hello Xiaoxiang,
>
> How are you, my boss is very interested in Kylin 5. so he would like to
> know when Kylin 5 will be released...could you please provide an
> estimation?
>
> Thank you very much and best regards
>
>
>
>
>
> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:
>
> > Good morning Xiaoxiang, hope you are well
> >
> > 1. JDBC source is a feature which in development, it will be supported
> > later.
> >
> > ===
> >
> > May I know when will the JDBC be available? as well as is there any
> change
> > in Kylin 5 release date
> >
> > Thank you and best regards
> >
> >
> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
> >
> >> 1. JDBC source is a feature which in development, it will be supported
> >> later.
> >>
> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> >> (I will let you know.)
> >>
> >> 3. I think ranger and Kerberos are not doing the same things, one for
> >> authentication, one for authorization. So they cannot replace each
> other.
> >> Ranger can integrate with Kerberos, please check ranger's website for
> >> information.
> >>
> >> 
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy 
> wrote:
> >>
> >> > Thank you Xiaoxiang for your reply
> >> >
> >> > -
> >> > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > -
> >> > Yes: please answer to help me clear this headache:
> >> >
> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
> >> If
> >> > not then do we have any work around?
> >> >
> >> > 2. My team is using kerberos for authentication, do you have any
> >> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> >> 5.x
> >> >
> >> > 3. Should we use apache ranger instead of kerberos for authentication
> >> and
> >> > for security purposes?
> >> >
> >> > Thank you again
> >> >
> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >> >
> >> > > I guess the release date should be 2024/01 .
> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > >
> >> > > 
> >> > > With warm regard
> >> > > Xiaoxiang Yu
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> >> > wrote:
> >> > >
> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
> >> > already
> >> > >> so there is no time for you to comment. Next time I will send you
> in
> >> > >> advance. The meeting result was that we will implement both druid
> and
> >> > >> kylin
> >> > >> in the next couple of projects because of its realtime feature.
> Hope
> >> > that
> >> > >> kylin will have same feature soon.
> >> > >>
> >> > >> May I ask when will you release kylin 5.0?
> >> > >>
> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu 
> wrote:
> >> > >>
> >> > >> > Since 2018 there are a lot of new features and code refactor.
> >> > >> > If you like, you can share your ppt to me privately, maybe I can
> >> > >> > give some comments.
> >> > >> >
> >> > >> > Here is the reference of advantages of Kylin since 2018:
> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > >> > -
> >> > >> >
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> > >> >
> >> > >> > 
> >> > >> > With warm regard
> >> > >> > Xiaoxiang Yu
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  >
> >> > >> wrote:
> >> > >> >
> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
> and
> >> > >> Druid in
> >> > >> >> my team.
> >> > >> >>
> >> > >> >> I found this article and would like you to update me the
> >> advantages
> >> > of
> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
> >> released)
> >> > >> >>
> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> >> 2)?
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> > >> >> >
> >> > >> >>
> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy 
> wrote:
> >> > >> >>
> >> > >> >> > Thank you very much for your prompt response, I still have
> >> several
> >> > >> >> > questions to seek for your help later.
> >> > >> >> >
> >> > >> >> > Best regards and have a good day
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Done. Github branch changed to kylin5.
> >> > >> >> >>
> >> > >> >> >>

Re: Pinot/Kylin/Druid quick comparision

2024-03-05 Thread Nam Đỗ Duy via user

Hello Xiaoxiang,

How are you, my boss is very interested in Kylin 5. so he would like to
know when Kylin 5 will be released...could you please provide an estimation?

Thank you very much and best regards





On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:

> Good morning Xiaoxiang, hope you are well
>
> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> ===
>
> May I know when will the JDBC be available? as well as is there any change
> in Kylin 5 release date
>
> Thank you and best regards
>
>
> On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
>
>> 1. JDBC source is a feature which in development, it will be supported
>> later.
>>
>> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> (I will let you know.)
>>
>> 3. I think ranger and Kerberos are not doing the same things, one for
>> authentication, one for authorization. So they cannot replace each other.
>> Ranger can integrate with Kerberos, please check ranger's website for
>> information.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>>
>> > Thank you Xiaoxiang for your reply
>> >
>> > -
>> > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > -
>> > Yes: please answer to help me clear this headache:
>> >
>> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
>> If
>> > not then do we have any work around?
>> >
>> > 2. My team is using kerberos for authentication, do you have any
>> > document/casestudy about integrating kerberos with kylin 4.x and kylin
>> 5.x
>> >
>> > 3. Should we use apache ranger instead of kerberos for authentication
>> and
>> > for security purposes?
>> >
>> > Thank you again
>> >
>> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>> >
>> > > I guess the release date should be 2024/01 .
>> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > >
>> > > 
>> > > With warm regard
>> > > Xiaoxiang Yu
>> > >
>> > >
>> > >
>> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
>> > wrote:
>> > >
>> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> > already
>> > >> so there is no time for you to comment. Next time I will send you in
>> > >> advance. The meeting result was that we will implement both druid and
>> > >> kylin
>> > >> in the next couple of projects because of its realtime feature. Hope
>> > that
>> > >> kylin will have same feature soon.
>> > >>
>> > >> May I ask when will you release kylin 5.0?
>> > >>
>> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>> > >>
>> > >> > Since 2018 there are a lot of new features and code refactor.
>> > >> > If you like, you can share your ppt to me privately, maybe I can
>> > >> > give some comments.
>> > >> >
>> > >> > Here is the reference of advantages of Kylin since 2018:
>> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > >> > -
>> > >> >
>> > >>
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> > >> >
>> > >> > 
>> > >> > With warm regard
>> > >> > Xiaoxiang Yu
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
>> > >> wrote:
>> > >> >
>> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> > >> Druid in
>> > >> >> my team.
>> > >> >>
>> > >> >> I found this article and would like you to update me the
>> advantages
>> > of
>> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> released)
>> > >> >>
>> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
>> 2)?
>> > >> >> <
>> > >> >>
>> > >>
>> >
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> > >> >> >
>> > >> >>
>> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>> > >> >>
>> > >> >> > Thank you very much for your prompt response, I still have
>> several
>> > >> >> > questions to seek for your help later.
>> > >> >> >
>> > >> >> > Best regards and have a good day
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
>> > wrote:
>> > >> >> >
>> > >> >> >> Done. Github branch changed to kylin5.
>> > >> >> >>
>> > >> >> >> 
>> > >> >> >> With warm regard
>> > >> >> >> Xiaoxiang Yu
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
>> > >> wrote:
>> > >> >> >>
>> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > >> >> >> > 
>> > >> >> >> > With warm regard
>> > >> >> >> > Xiaoxiang Yu
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> > On Tue, Dec 5, 2023 at

Re: kylin4_on_cloud deployment errors

2024-02-07 Thread John W

I followed the troubleshooting instructions at:
https://github.com/apache/kylin/blob/kylin4_on_cloud/readme/trouble_shooting.md#kylin-can-not-access-and-exception-session-0x0-for-server-null-unexpected-error-closing-socket-connection-and-attempting-reconnect-is-in-kylinlog

When logging into the zookeeper instances, the .bash_profile file looks
standard and there is no reference to $ZOOKEEPER_HOME

[root@ip-172-27-32-153 ec2-user]# cat .bash_profile
--
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin

export PATH
---

So the deployment did not configure the zookeeper instances at all. Would
anyone know how to fix this?

Any help appreciated.

On Wed, 7 Feb 2024 at 04:33, John W  wrote:

> Hi, I'm having problems deploying the kylin4_on_cloud project located at:
> https://github.com/apache/kylin/tree/kylin4_on_cloud
>
> I've also been following the instructions here
> https://www.youtube.com/watch?v=5kKXEMjO1Sc_channel=Kyligence
>
> I used windows to git clone the repo and set up the venv with the latest
> packages via:
> pip install PyYAML
> pip install boto3
> pip install botocore
> pip install pyparsing
> pip install requests
> pip install retrying
> pip install Jinja2
> pip install pytest-shutil
> :
> I also changed the RDSEngineVersion to 8.0.35 in kylin_configs.yaml, as
> RDSEngineVersion 5.7.25 (default repo version) was giving me the error
> "Exception: Current stack: ec2-rds-stack is create failed, please check".
>
> Here's the log with error I am now getting:
>
> ==
> (venv) C:\projects\kylin4_on_cloud>python deploy.py --type deploy --mode
> job
> 2024-02-07 02:13:54 - botocore.credentials - INFO - 5484 - Found
> credentials in shared credentials file: ~/.aws/credentials
> 2024-02-07 02:13:57 - engine - INFO - 5484 - Env already inited, skip init
> again.
> 2024-02-07 02:13:58 - clouds.aws - WARNING - 5484 - Current env for
> deploying a cluster is not ready.
> 2024-02-07 02:14:20 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-or-emr-vpc-stack.
> 2024-02-07 02:16:42 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-rds-stack.
> 2024-02-07 02:21:06 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-static-service-stack.
> 2024-02-07 02:21:06 - engine - INFO - 5484 - First launch default Kylin
> Cluster.
> 2024-02-07 02:22:08 - clouds.aws - WARNING - 5484 - Current cluster is not
> ready.
> 2024-02-07 02:22:30 - instances.aws_instance - INFO - 5484 - Now creating
> stack: ec2-zookeeper-stack.
> 2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
> execute commands in `Zookeeper stack` which named ec2-zookeeper-stack.
> 2024-02-07 02:23:43 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0cbc37f83c9cda006 is executing commands: grep -Fq
> "10.1.0.133:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:23:49 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0915d44c700e644dc is executing commands: grep -Fq
> "10.1.0.129:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:23:54 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0fdbacc22ecae360a is executing commands: grep -Fq
> "10.1.0.58:2888:3888" /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg; echo
> $?.
> 2024-02-07 02:24:00 - instances.aws_instance - INFO - 5484 - Current
> instance id: i-0cbc37f83c9cda006 is executing commands: echo
> 'server.1=10.1.0.133:2888:3888
> server.2=10.1.0.129:2888:3888
> server.3=10.1.0.58:2888:3888' >>
> /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg.
> 2024-02-07 02:24:05 - instances.aws_instance - WARNING - 5484 -
> {'CommandId': '704b776f-e574-47ea-bf13-30d3be2e9df2', 'InstanceId':
> 'i-0cbc37f83c9cda006', 'Comment': '', 'DocumentName': 'AWS-RunShellScript',
> 'DocumentVersion': '$DEFAULT', 'PluginName': 'aws:runShellScript',
> 'ResponseCode': 1,
> 'ExecutionStartDateTime': '2024-02-06T16:24:00.394Z',
> 'ExecutionElapsedTime': 'PT0.008S', 'ExecutionEndDateTime':
> '2024-02-06T16:24:00.394Z', 'Status': 'Failed', 'StatusDetails': 'Failed',
> 'StandardOutputContent': '', 'StandardOutputUrl': '',
> 'StandardErrorContent':
> '/var/lib/amazon/ssm/i-0cbc37f83c9cda006/document/orchestration/704b776f-e574-47ea-bf13-30d3be2e9df2/awsrunShellScript/0.awsrunShellScript/_script.sh:
> line 3: /home/ec2-user/hadoop/zookeeper/conf/zoo.cfg: No such file or
> directory\nfailed to run commands: exit status 1', 'StandardErrorUrl': '',
> 'CloudWatchOutputConfig': {'CloudWatchLogGroupName': '',
> 'CloudWatchOutputEnabled': False}, 'ResponseMetadata': {'RequestId':
> '133ea7d8-d661-4ea0-960d-349b294dd8a9', 'HTTPStatusCode': 200,
> 'HTTPHeaders': {'server': 'Server', 'date': 'Tue, 06 Feb 2024 16:24:05

Re: Pinot/Kylin/Druid quick comparision

2024-01-17 Thread Nam Đỗ Duy via user

Good morning Xiaoxiang, hope you are well

1. JDBC source is a feature which in development, it will be supported
later.

===

May I know when will the JDBC be available? as well as is there any change
in Kylin 5 release date

Thank you and best regards


On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > -
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > -
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > 
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> 
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > 
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >  > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  >
> > >> >> wrote:
> >

Re: 退订

2023-12-17 Thread lee

退订

> 2023年12月17日 16:48，怪侠一枝梅  写道：
> 
> 退订

Re: 退订

2023-12-15 Thread Yanjing Wang

退订

313463...@qq.com <313463...@qq.com> 于2023年12月13日周三 17:01写道：

> 退订
>
> --
> 313463...@qq.com
> 313463...@qq.com
>
> 
>
>
>
> -- 原始邮件 --
> *发件人:* "user" ;
> *发送时间:* 2023年12月13日(星期三) 下午4:54
> *收件人:* "user";
> *抄送:* "dev";
> *主题:* 退订
>
> 退订
>

Re: ACID with Hive/Kylin

2023-12-12 Thread Nam Đỗ Duy via user

Thank you both of you for your valuable information. I will test and revert
soon.

Best regards

On Tue, Dec 12, 2023 at 2:39 PM Xiaoxiang Yu  wrote:

> I don't know GDPR very well. Here is my understanding.
>
> For hive and hdfs, you can consider using these techniques which support
> ACID in Spark and Hive(I recommend first one):
> 1) Delta Lake,
> https://docs.databricks.com/en/security/privacy/gdpr-delta.html
> 2) Hive ACID table, here is a link,
>
> https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/migrate-hive-workloads/topics/hive-acid-migration-regulations.html
>
> For Kylin, there are three places which may store data, index, snapshot,
> dict. The refresh of the snapshot costs
> less time and resources,  while refresh of index/dict much more. Snapshot
> refresh will be triggered automatically
> when you build an index every day.
>
> I think you should consider centralizing user-sensitive columns(email,
> phone, address) in dimension tables,
> and your fact table only has the foreign key(for example, uid) which refers
> to the primary key of dimension tables.
> When you are modeling in Kylin, for these dim tables which contains
> user-sensitive columns, try
>
> 1. set dim tables as snapshot by disable precompute join relation, so these
> columns won't be built into indexes, refer
>
> https://kylin.apache.org/5.0/docs/modeling/model_design/precompute_join_relations
> 2. not create a bitmap measure on these columns, so these columns won't be
> built into dict
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 12, 2023 at 12:11 PM Nam Đỗ Duy 
> wrote:
>
> > Dear Xiaoxiang, Sirs/Madams
> >
> > I face an issue with deleting data of user according to GPDR-like policy
> > which means when user send request to delete their personal data, we need
> > to delete it from all system, that means to delete data:
> >
> > 1- from Kylin index (cube)
> > 2- from Hive
> > 3- from HDFS
> >
> > Have you had the same use-case before, do you have any suggestions to
> > achieve this scenario?
> >
> > Thank you very much and best regards
> >
>

Re: ACID with Hive/Kylin

2023-12-11 Thread Xiaoxiang Yu

I don't know GDPR very well. Here is my understanding.

For hive and hdfs, you can consider using these techniques which support
ACID in Spark and Hive(I recommend first one):
1) Delta Lake,
https://docs.databricks.com/en/security/privacy/gdpr-delta.html
2) Hive ACID table, here is a link,
https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/migrate-hive-workloads/topics/hive-acid-migration-regulations.html

For Kylin, there are three places which may store data, index, snapshot,
dict. The refresh of the snapshot costs
less time and resources,  while refresh of index/dict much more. Snapshot
refresh will be triggered automatically
when you build an index every day.

I think you should consider centralizing user-sensitive columns(email,
phone, address) in dimension tables,
and your fact table only has the foreign key(for example, uid) which refers
to the primary key of dimension tables.
When you are modeling in Kylin, for these dim tables which contains
user-sensitive columns, try

1. set dim tables as snapshot by disable precompute join relation, so these
columns won't be built into indexes, refer
https://kylin.apache.org/5.0/docs/modeling/model_design/precompute_join_relations
2. not create a bitmap measure on these columns, so these columns won't be
built into dict

With warm regard
Xiaoxiang Yu

On Tue, Dec 12, 2023 at 12:11 PM Nam Đỗ Duy  wrote:

> Dear Xiaoxiang, Sirs/Madams
>
> I face an issue with deleting data of user according to GPDR-like policy
> which means when user send request to delete their personal data, we need
> to delete it from all system, that means to delete data:
>
> 1- from Kylin index (cube)
> 2- from Hive
> 3- from HDFS
>
> Have you had the same use-case before, do you have any suggestions to
> achieve this scenario?
>
> Thank you very much and best regards
>

Re: ACID with Hive/Kylin

2023-12-11 Thread ShaoFeng Shi

Hi Nam,

As Kylin is used to store the aggregated data, there should be no PII
information. (if you use Kylin to manage person level data, that is not a
good case).

If you do need to delete certain personal data, refresh the whole index or
some partitions is what we can do.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Nam Đỗ Duy  于2023年12月12日周二 12:11写道：

> Dear Xiaoxiang, Sirs/Madams
>
> I face an issue with deleting data of user according to GPDR-like policy
> which means when user send request to delete their personal data, we need
> to delete it from all system, that means to delete data:
>
> 1- from Kylin index (cube)
> 2- from Hive
> 3- from HDFS
>
> Have you had the same use-case before, do you have any suggestions to
> achieve this scenario?
>
> Thank you very much and best regards
>

Re: Pinot/Kylin/Druid quick comparision

2023-12-10 Thread Nam Đỗ Duy via user

Thank you very much, please kindly start kylin-kerberos document and JDBC
connectivity, we will be actively participating in testing that JDBC when
it is available so please let us know.

Best regards

On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > -
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > -
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > 
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> 
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > 
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >  > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is

Re: Pinot/Kylin/Druid quick comparision

2023-12-10 Thread Xiaoxiang Yu

1. JDBC source is a feature which in development, it will be supported
later.

2. Kylin supports kerberos now, I will write a doc as soon as possible.
(I will let you know.)

3. I think ranger and Kerberos are not doing the same things, one for
authentication, one for authorization. So they cannot replace each other.
Ranger can integrate with Kerberos, please check ranger's website for
information.


With warm regard
Xiaoxiang Yu



On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang for your reply
>
> -
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> -
> Yes: please answer to help me clear this headache:
>
> 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> not then do we have any work around?
>
> 2. My team is using kerberos for authentication, do you have any
> document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x
>
> 3. Should we use apache ranger instead of kerberos for authentication and
> for security purposes?
>
> Thank you again
>
> On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
>
> > I guess the release date should be 2024/01 .
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you very much xiaoxiang, I did the presentation this morning
> already
> >> so there is no time for you to comment. Next time I will send you in
> >> advance. The meeting result was that we will implement both druid and
> >> kylin
> >> in the next couple of projects because of its realtime feature. Hope
> that
> >> kylin will have same feature soon.
> >>
> >> May I ask when will you release kylin 5.0?
> >>
> >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
> >>
> >> > Since 2018 there are a lot of new features and code refactor.
> >> > If you like, you can share your ppt to me privately, maybe I can
> >> > give some comments.
> >> >
> >> > Here is the reference of advantages of Kylin since 2018:
> >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > -
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> >
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> >> wrote:
> >> >
> >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> >> Druid in
> >> >> my team.
> >> >>
> >> >> I found this article and would like you to update me the advantages
> of
> >> >> Kylin since 2018 until now (especially with version 5 to be released)
> >> >>
> >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> >> <
> >> >>
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >> >
> >> >>
> >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> >> >>
> >> >> > Thank you very much for your prompt response, I still have several
> >> >> > questions to seek for your help later.
> >> >> >
> >> >> > Best regards and have a good day
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> wrote:
> >> >> >
> >> >> >> Done. Github branch changed to kylin5.
> >> >> >>
> >> >> >> 
> >> >> >> With warm regard
> >> >> >> Xiaoxiang Yu
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> >> wrote:
> >> >> >>
> >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> >> > 
> >> >> >> > With warm regard
> >> >> >> > Xiaoxiang Yu
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>  >> >
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> your
> >> >> >> default
> >> >> >> >> branch. In case people are impressed by the numbers then I hope
> >> to
> >> >> turn
> >> >> >> >> this situation to reverse direction.
> >> >> >> >>
> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> the
> >> >> active
> >> >> >> >>> branch is kylin5.
> >> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >> >>>
> >> >> >> >>> 
> >> >> >> >>> With warm regard
> >> >> >> >>> Xiaoxiang Yu
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> 
> >> >> >> >>> wrote:
> >> >> >> >>>
> >> >> >>  Hi Xiaoxiang, Sirs / Madams
> >> >> >> 
> >> >> >>  Can you see the atttached photo
> >> >> >> 
> >> >> >>  My boss asked that why

Re: Pinot/Kylin/Druid quick comparision

2023-12-08 Thread Nam Đỗ Duy via user

Thank you Xiaoxiang for your reply

-
Do you have any suggestions/wishes for kylin 5(except real-time feature)?
-
Yes: please answer to help me clear this headache:

1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
not then do we have any work around?

2. My team is using kerberos for authentication, do you have any
document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x

3. Should we use apache ranger instead of kerberos for authentication and
for security purposes?

Thank you again

On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:

> I guess the release date should be 2024/01 .
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy  wrote:
>
>> Thank you very much xiaoxiang, I did the presentation this morning already
>> so there is no time for you to comment. Next time I will send you in
>> advance. The meeting result was that we will implement both druid and
>> kylin
>> in the next couple of projects because of its realtime feature. Hope that
>> kylin will have same feature soon.
>>
>> May I ask when will you release kylin 5.0?
>>
>> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>>
>> > Since 2018 there are a lot of new features and code refactor.
>> > If you like, you can share your ppt to me privately, maybe I can
>> > give some comments.
>> >
>> > Here is the reference of advantages of Kylin since 2018:
>> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > -
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
>> wrote:
>> >
>> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> Druid in
>> >> my team.
>> >>
>> >> I found this article and would like you to update me the advantages of
>> >> Kylin since 2018 until now (especially with version 5 to be released)
>> >>
>> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> >> <
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> >
>> >>
>> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>> >>
>> >> > Thank you very much for your prompt response, I still have several
>> >> > questions to seek for your help later.
>> >> >
>> >> > Best regards and have a good day
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>> >> >
>> >> >> Done. Github branch changed to kylin5.
>> >> >>
>> >> >> 
>> >> >> With warm regard
>> >> >> Xiaoxiang Yu
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
>> wrote:
>> >> >>
>> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> >> > 
>> >> >> > With warm regard
>> >> >> > Xiaoxiang Yu
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy > >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> >> default
>> >> >> >> branch. In case people are impressed by the numbers then I hope
>> to
>> >> turn
>> >> >> >> this situation to reverse direction.
>> >> >> >>
>> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
>> >> wrote:
>> >> >> >>
>> >> >> >>> The default branch is for 4.X which is a maintained branch, the
>> >> active
>> >> >> >>> branch is kylin5.
>> >> >> >>> I will change the default branch to kylin5 later.
>> >> >> >>>
>> >> >> >>> 
>> >> >> >>> With warm regard
>> >> >> >>> Xiaoxiang Yu
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> 
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >>  Hi Xiaoxiang, Sirs / Madams
>> >> >> 
>> >> >>  Can you see the atttached photo
>> >> >> 
>> >> >>  My boss asked that why druid commit code regularly but kylin
>> had
>> >> not
>> >> >>  been committed since July
>> >> >> 
>> >> >> 
>> >> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu 
>> wrote:
>> >> >> 
>> >> >> > I think so.
>> >> >> >
>> >> >> > Response time is not the only factor to make a decision. Kylin
>> >> could
>> >> >> > be cheaper
>> >> >> > when the query pattern is suitable for the Kylin model, and
>> Kylin
>> >> >> can
>> >> >> > guarantee
>> >> >> > reasonable query latency. Clickhouse will be quicker in an ad
>> hoc
>> >> >> > query scenario.
>> >> >> >
>> >> >> > By the way, Youzan and Kyligence combine them together to
>> provide
>> >> >> > unified data analytics services for their customers.
>> >> >> >
>> >> >> >

Re: Pinot/Kylin/Druid quick comparision

2023-12-07 Thread Xiaoxiang Yu

I guess the release date should be 2024/01 .
Do you have any suggestions/wishes for kylin 5(except real-time feature)?


With warm regard
Xiaoxiang Yu



On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy  wrote:

> Thank you very much xiaoxiang, I did the presentation this morning already
> so there is no time for you to comment. Next time I will send you in
> advance. The meeting result was that we will implement both druid and kylin
> in the next couple of projects because of its realtime feature. Hope that
> kylin will have same feature soon.
>
> May I ask when will you release kylin 5.0?
>
> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:
>
> > Since 2018 there are a lot of new features and code refactor.
> > If you like, you can share your ppt to me privately, maybe I can
> > give some comments.
> >
> > Here is the reference of advantages of Kylin since 2018:
> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > -
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy 
> wrote:
> >
> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid
> in
> >> my team.
> >>
> >> I found this article and would like you to update me the advantages of
> >> Kylin since 2018 until now (especially with version 5 to be released)
> >>
> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> <
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >
> >>
> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
> >>
> >> > Thank you very much for your prompt response, I still have several
> >> > questions to seek for your help later.
> >> >
> >> > Best regards and have a good day
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
> >> >
> >> >> Done. Github branch changed to kylin5.
> >> >>
> >> >> 
> >> >> With warm regard
> >> >> Xiaoxiang Yu
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu 
> wrote:
> >> >>
> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> > 
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  >
> >> >> wrote:
> >> >> >
> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> >> default
> >> >> >> branch. In case people are impressed by the numbers then I hope to
> >> turn
> >> >> >> this situation to reverse direction.
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
> >> wrote:
> >> >> >>
> >> >> >>> The default branch is for 4.X which is a maintained branch, the
> >> active
> >> >> >>> branch is kylin5.
> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >>>
> >> >> >>> 
> >> >> >>> With warm regard
> >> >> >>> Xiaoxiang Yu
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy  >
> >> >> >>> wrote:
> >> >> >>>
> >> >>  Hi Xiaoxiang, Sirs / Madams
> >> >> 
> >> >>  Can you see the atttached photo
> >> >> 
> >> >>  My boss asked that why druid commit code regularly but kylin had
> >> not
> >> >>  been committed since July
> >> >> 
> >> >> 
> >> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu 
> wrote:
> >> >> 
> >> >> > I think so.
> >> >> >
> >> >> > Response time is not the only factor to make a decision. Kylin
> >> could
> >> >> > be cheaper
> >> >> > when the query pattern is suitable for the Kylin model, and
> Kylin
> >> >> can
> >> >> > guarantee
> >> >> > reasonable query latency. Clickhouse will be quicker in an ad
> hoc
> >> >> > query scenario.
> >> >> >
> >> >> > By the way, Youzan and Kyligence combine them together to
> provide
> >> >> > unified data analytics services for their customers.
> >> >> >
> >> >> > 
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>  >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Hi Xiaoxiang, thank you
> >> >> >>
> >> >> >> In case my client uses cloud computing service like gcp or
> aws,
> >> >> which
> >> >> >> will cost more: precalculation feature of kylin or clickhouse
> >> >> (incase
> >> >> >> of
> >> >> >> kylin, I have a thought that the query execution has been done
> >> once
> >> >> >> and
> >> >> >> stored in cube to be used many times so kylin uses less cloud
> >> >> >> computation,
> >> >> >> is that true)?
> >> >> >>
> >> >> >> On Mon, Dec 4,

Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Nam Đỗ Duy via user

Thank you very much xiaoxiang, I did the presentation this morning already
so there is no time for you to comment. Next time I will send you in
advance. The meeting result was that we will implement both druid and kylin
in the next couple of projects because of its realtime feature. Hope that
kylin will have same feature soon.

May I ask when will you release kylin 5.0?

On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu  wrote:

> Since 2018 there are a lot of new features and code refactor.
> If you like, you can share your ppt to me privately, maybe I can
> give some comments.
>
> Here is the reference of advantages of Kylin since 2018:
> - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> -
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> - https://kylin.apache.org/5.0/docs/development/roadmap
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  wrote:
>
>> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
>> my team.
>>
>> I found this article and would like you to update me the advantages of
>> Kylin since 2018 until now (especially with version 5 to be released)
>>
>> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> <
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >
>>
>> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>>
>> > Thank you very much for your prompt response, I still have several
>> > questions to seek for your help later.
>> >
>> > Best regards and have a good day
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>> >
>> >> Done. Github branch changed to kylin5.
>> >>
>> >> 
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>> >>
>> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > 
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
>> >> wrote:
>> >> >
>> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> default
>> >> >> branch. In case people are impressed by the numbers then I hope to
>> turn
>> >> >> this situation to reverse direction.
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu 
>> wrote:
>> >> >>
>> >> >>> The default branch is for 4.X which is a maintained branch, the
>> active
>> >> >>> branch is kylin5.
>> >> >>> I will change the default branch to kylin5 later.
>> >> >>>
>> >> >>> 
>> >> >>> With warm regard
>> >> >>> Xiaoxiang Yu
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>> >> >>> wrote:
>> >> >>>
>> >>  Hi Xiaoxiang, Sirs / Madams
>> >> 
>> >>  Can you see the atttached photo
>> >> 
>> >>  My boss asked that why druid commit code regularly but kylin had
>> not
>> >>  been committed since July
>> >> 
>> >> 
>> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>> >> 
>> >> > I think so.
>> >> >
>> >> > Response time is not the only factor to make a decision. Kylin
>> could
>> >> > be cheaper
>> >> > when the query pattern is suitable for the Kylin model, and Kylin
>> >> can
>> >> > guarantee
>> >> > reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >> > query scenario.
>> >> >
>> >> > By the way, Youzan and Kyligence combine them together to provide
>> >> > unified data analytics services for their customers.
>> >> >
>> >> > 
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy > >
>> >> > wrote:
>> >> >
>> >> >> Hi Xiaoxiang, thank you
>> >> >>
>> >> >> In case my client uses cloud computing service like gcp or aws,
>> >> which
>> >> >> will cost more: precalculation feature of kylin or clickhouse
>> >> (incase
>> >> >> of
>> >> >> kylin, I have a thought that the query execution has been done
>> once
>> >> >> and
>> >> >> stored in cube to be used many times so kylin uses less cloud
>> >> >> computation,
>> >> >> is that true)?
>> >> >>
>> >> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
>> >> wrote:
>> >> >>
>> >> >> > Following text is part of an article(
>> >> >> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> ===
>> >> >> >
>> >> >> > Kylin is suitable for aggregation queries with fixed modes
>> >> because
>> >> >> of its
>> >> >> > pre-calculated technology, for example, join, group by, and
>> where
>> >> >>

Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Xiaoxiang Yu

Since 2018 there are a lot of new features and code refactor.
If you like, you can share your ppt to me privately, maybe I can
give some comments.

Here is the reference of advantages of Kylin since 2018:
- https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
-
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
- https://kylin.apache.org/5.0/docs/development/roadmap


With warm regard
Xiaoxiang Yu



On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  wrote:

> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
> my team.
>
> I found this article and would like you to update me the advantages of
> Kylin since 2018 until now (especially with version 5 to be released)
>
> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> <
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >
>
> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:
>
> > Thank you very much for your prompt response, I still have several
> > questions to seek for your help later.
> >
> > Best regards and have a good day
> >
> >
> >
> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
> >
> >> Done. Github branch changed to kylin5.
> >>
> >> 
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
> >>
> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
> >> wrote:
> >> >
> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> default
> >> >> branch. In case people are impressed by the numbers then I hope to
> turn
> >> >> this situation to reverse direction.
> >> >>
> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
> >> >>
> >> >>> The default branch is for 4.X which is a maintained branch, the
> active
> >> >>> branch is kylin5.
> >> >>> I will change the default branch to kylin5 later.
> >> >>>
> >> >>> 
> >> >>> With warm regard
> >> >>> Xiaoxiang Yu
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
> >> >>> wrote:
> >> >>>
> >>  Hi Xiaoxiang, Sirs / Madams
> >> 
> >>  Can you see the atttached photo
> >> 
> >>  My boss asked that why druid commit code regularly but kylin had
> not
> >>  been committed since July
> >> 
> >> 
> >>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
> >> 
> >> > I think so.
> >> >
> >> > Response time is not the only factor to make a decision. Kylin
> could
> >> > be cheaper
> >> > when the query pattern is suitable for the Kylin model, and Kylin
> >> can
> >> > guarantee
> >> > reasonable query latency. Clickhouse will be quicker in an ad hoc
> >> > query scenario.
> >> >
> >> > By the way, Youzan and Kyligence combine them together to provide
> >> > unified data analytics services for their customers.
> >> >
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy  >
> >> > wrote:
> >> >
> >> >> Hi Xiaoxiang, thank you
> >> >>
> >> >> In case my client uses cloud computing service like gcp or aws,
> >> which
> >> >> will cost more: precalculation feature of kylin or clickhouse
> >> (incase
> >> >> of
> >> >> kylin, I have a thought that the query execution has been done
> once
> >> >> and
> >> >> stored in cube to be used many times so kylin uses less cloud
> >> >> computation,
> >> >> is that true)?
> >> >>
> >> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
> >> wrote:
> >> >>
> >> >> > Following text is part of an article(
> >> >> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> ===
> >> >> >
> >> >> > Kylin is suitable for aggregation queries with fixed modes
> >> because
> >> >> of its
> >> >> > pre-calculated technology, for example, join, group by, and
> where
> >> >> condition
> >> >> > modes in SQL are relatively fixed, etc. The larger the data
> >> volume
> >> >> is, the
> >> >> > more obvious the advantages of using Kylin are; in particular,
> >> >> Kylin is
> >> >> > particularly advantageous in the scenarios of de-emphasis
> (count
> >> >> distinct),
> >> >> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >> de-weighting
> >> >> > (count distinct), Top N, Percentile and other scenarios are
> >> >> especially
> >> >> > huge, and it is used in a large number of scenarios, such as
> >> >>

Re: Pinot/Kylin/Druid quick comparision

2023-12-06 Thread Nam Đỗ Duy via user

Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
my team.

I found this article and would like you to update me the advantages of
Kylin since 2018 until now (especially with version 5 to be released)

Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?


On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy  wrote:

> Thank you very much for your prompt response, I still have several
> questions to seek for your help later.
>
> Best regards and have a good day
>
>
>
> On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:
>
>> Done. Github branch changed to kylin5.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>>
>> > A JIRA ticket has been opened, waiting for INFRA :
>> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
>> wrote:
>> >
>> >> Thank you Xiaoxiang, please update me when you have changed your
>> default
>> >> branch. In case people are impressed by the numbers then I hope to turn
>> >> this situation to reverse direction.
>> >>
>> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>> >>
>> >>> The default branch is for 4.X which is a maintained branch, the active
>> >>> branch is kylin5.
>> >>> I will change the default branch to kylin5 later.
>> >>>
>> >>> 
>> >>> With warm regard
>> >>> Xiaoxiang Yu
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>> >>> wrote:
>> >>>
>>  Hi Xiaoxiang, Sirs / Madams
>> 
>>  Can you see the atttached photo
>> 
>>  My boss asked that why druid commit code regularly but kylin had not
>>  been committed since July
>> 
>> 
>>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>> 
>> > I think so.
>> >
>> > Response time is not the only factor to make a decision. Kylin could
>> > be cheaper
>> > when the query pattern is suitable for the Kylin model, and Kylin
>> can
>> > guarantee
>> > reasonable query latency. Clickhouse will be quicker in an ad hoc
>> > query scenario.
>> >
>> > By the way, Youzan and Kyligence combine them together to provide
>> > unified data analytics services for their customers.
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
>> > wrote:
>> >
>> >> Hi Xiaoxiang, thank you
>> >>
>> >> In case my client uses cloud computing service like gcp or aws,
>> which
>> >> will cost more: precalculation feature of kylin or clickhouse
>> (incase
>> >> of
>> >> kylin, I have a thought that the query execution has been done once
>> >> and
>> >> stored in cube to be used many times so kylin uses less cloud
>> >> computation,
>> >> is that true)?
>> >>
>> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
>> wrote:
>> >>
>> >> > Following text is part of an article(
>> >> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >
>> >> >
>> >> >
>> >>
>> ===
>> >> >
>> >> > Kylin is suitable for aggregation queries with fixed modes
>> because
>> >> of its
>> >> > pre-calculated technology, for example, join, group by, and where
>> >> condition
>> >> > modes in SQL are relatively fixed, etc. The larger the data
>> volume
>> >> is, the
>> >> > more obvious the advantages of using Kylin are; in particular,
>> >> Kylin is
>> >> > particularly advantageous in the scenarios of de-emphasis (count
>> >> distinct),
>> >> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> de-weighting
>> >> > (count distinct), Top N, Percentile and other scenarios are
>> >> especially
>> >> > huge, and it is used in a large number of scenarios, such as
>> >> Dashboard, all
>> >> > kinds of reports, large-screen display, traffic statistics, and
>> user
>> >> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> >> to build
>> >> > their data service platforms, providing millions to tens of
>> >> millions of
>> >> > queries per day, and most of the queries can be completed within
>> 2
>> >> - 3
>> >> > seconds. There is no better alternative for such a high
>> concurrency
>> >> > scenario.
>> >> >
>> >> > ClickHouse, because of its MPP architecture, has high computing
>> >> power and
>> >> > is more suitable when the query request is more flexible, or when
>> >> there is
>> >> > a need for detailed queries with low concurrency. Scenarios
>> >> include: very
>> >> > many columns and

Re: Pinot/Kylin/Druid quick comparision

2023-12-05 Thread Nam Đỗ Duy via user

Thank you very much for your prompt response, I still have several
questions to seek for your help later.

Best regards and have a good day



On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu  wrote:

> Done. Github branch changed to kylin5.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:
>
> > A JIRA ticket has been opened, waiting for INFRA :
> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang, please update me when you have changed your default
> >> branch. In case people are impressed by the numbers then I hope to turn
> >> this situation to reverse direction.
> >>
> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
> >>
> >>> The default branch is for 4.X which is a maintained branch, the active
> >>> branch is kylin5.
> >>> I will change the default branch to kylin5 later.
> >>>
> >>> 
> >>> With warm regard
> >>> Xiaoxiang Yu
> >>>
> >>>
> >>>
> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
> >>> wrote:
> >>>
>  Hi Xiaoxiang, Sirs / Madams
> 
>  Can you see the atttached photo
> 
>  My boss asked that why druid commit code regularly but kylin had not
>  been committed since July
> 
> 
>  On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
> 
> > I think so.
> >
> > Response time is not the only factor to make a decision. Kylin could
> > be cheaper
> > when the query pattern is suitable for the Kylin model, and Kylin can
> > guarantee
> > reasonable query latency. Clickhouse will be quicker in an ad hoc
> > query scenario.
> >
> > By the way, Youzan and Kyligence combine them together to provide
> > unified data analytics services for their customers.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
> > wrote:
> >
> >> Hi Xiaoxiang, thank you
> >>
> >> In case my client uses cloud computing service like gcp or aws,
> which
> >> will cost more: precalculation feature of kylin or clickhouse
> (incase
> >> of
> >> kylin, I have a thought that the query execution has been done once
> >> and
> >> stored in cube to be used many times so kylin uses less cloud
> >> computation,
> >> is that true)?
> >>
> >> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu 
> wrote:
> >>
> >> > Following text is part of an article(
> >> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >
> >> >
> >> >
> >>
> ===
> >> >
> >> > Kylin is suitable for aggregation queries with fixed modes because
> >> of its
> >> > pre-calculated technology, for example, join, group by, and where
> >> condition
> >> > modes in SQL are relatively fixed, etc. The larger the data volume
> >> is, the
> >> > more obvious the advantages of using Kylin are; in particular,
> >> Kylin is
> >> > particularly advantageous in the scenarios of de-emphasis (count
> >> distinct),
> >> > Top N, and Percentile. In particular, Kylin's advantages in
> >> de-weighting
> >> > (count distinct), Top N, Percentile and other scenarios are
> >> especially
> >> > huge, and it is used in a large number of scenarios, such as
> >> Dashboard, all
> >> > kinds of reports, large-screen display, traffic statistics, and
> user
> >> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
> >> to build
> >> > their data service platforms, providing millions to tens of
> >> millions of
> >> > queries per day, and most of the queries can be completed within 2
> >> - 3
> >> > seconds. There is no better alternative for such a high
> concurrency
> >> > scenario.
> >> >
> >> > ClickHouse, because of its MPP architecture, has high computing
> >> power and
> >> > is more suitable when the query request is more flexible, or when
> >> there is
> >> > a need for detailed queries with low concurrency. Scenarios
> >> include: very
> >> > many columns and where conditions are arbitrarily combined with
> the
> >> user
> >> > label filtering, not a large amount of concurrency of complex
> >> on-the-spot
> >> > query and so on. If the amount of data and access is large, you
> >> need to
> >> > deploy a distributed ClickHouse cluster, which is a higher
> >> challenge for
> >> > operation and maintenance.
> >> >
> >> > If some queries are very flexible but infrequent, it is more
> >> > resource-efficient to use now-computing. Since the number of
> >> queries is
> >> > small, even if each query

Re: Pinot/Kylin/Druid quick comparision

2023-12-05 Thread Xiaoxiang Yu

Done. Github branch changed to kylin5.


With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu  wrote:

> A JIRA ticket has been opened, waiting for INFRA :
> https://issues.apache.org/jira/browse/INFRA-25238 .
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  wrote:
>
>> Thank you Xiaoxiang, please update me when you have changed your default
>> branch. In case people are impressed by the numbers then I hope to turn
>> this situation to reverse direction.
>>
>> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>>
>>> The default branch is for 4.X which is a maintained branch, the active
>>> branch is kylin5.
>>> I will change the default branch to kylin5 later.
>>>
>>> 
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy 
>>> wrote:
>>>
 Hi Xiaoxiang, Sirs / Madams

 Can you see the atttached photo

 My boss asked that why druid commit code regularly but kylin had not
 been committed since July


 On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:

> I think so.
>
> Response time is not the only factor to make a decision. Kylin could
> be cheaper
> when the query pattern is suitable for the Kylin model, and Kylin can
> guarantee
> reasonable query latency. Clickhouse will be quicker in an ad hoc
> query scenario.
>
> By the way, Youzan and Kyligence combine them together to provide
> unified data analytics services for their customers.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
> wrote:
>
>> Hi Xiaoxiang, thank you
>>
>> In case my client uses cloud computing service like gcp or aws, which
>> will cost more: precalculation feature of kylin or clickhouse (incase
>> of
>> kylin, I have a thought that the query execution has been done once
>> and
>> stored in cube to be used many times so kylin uses less cloud
>> computation,
>> is that true)?
>>
>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>>
>> > Following text is part of an article(
>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >
>> >
>> >
>> ===
>> >
>> > Kylin is suitable for aggregation queries with fixed modes because
>> of its
>> > pre-calculated technology, for example, join, group by, and where
>> condition
>> > modes in SQL are relatively fixed, etc. The larger the data volume
>> is, the
>> > more obvious the advantages of using Kylin are; in particular,
>> Kylin is
>> > particularly advantageous in the scenarios of de-emphasis (count
>> distinct),
>> > Top N, and Percentile. In particular, Kylin's advantages in
>> de-weighting
>> > (count distinct), Top N, Percentile and other scenarios are
>> especially
>> > huge, and it is used in a large number of scenarios, such as
>> Dashboard, all
>> > kinds of reports, large-screen display, traffic statistics, and user
>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> to build
>> > their data service platforms, providing millions to tens of
>> millions of
>> > queries per day, and most of the queries can be completed within 2
>> - 3
>> > seconds. There is no better alternative for such a high concurrency
>> > scenario.
>> >
>> > ClickHouse, because of its MPP architecture, has high computing
>> power and
>> > is more suitable when the query request is more flexible, or when
>> there is
>> > a need for detailed queries with low concurrency. Scenarios
>> include: very
>> > many columns and where conditions are arbitrarily combined with the
>> user
>> > label filtering, not a large amount of concurrency of complex
>> on-the-spot
>> > query and so on. If the amount of data and access is large, you
>> need to
>> > deploy a distributed ClickHouse cluster, which is a higher
>> challenge for
>> > operation and maintenance.
>> >
>> > If some queries are very flexible but infrequent, it is more
>> > resource-efficient to use now-computing. Since the number of
>> queries is
>> > small, even if each query consumes a lot of computational
>> resources, it is
>> > still cost-effective overall. If some queries have a fixed pattern
>> and the
>> > query volume is large, it is more suitable for Kylin, because the
>> query
>> > volume is large, and by using large computational resources to save
>> the
>> > results, the upfront computational cost can be amortized over each
>> query,
>> > so it is the most economical.
>> >
>> > ---

Re: kylin4.0.3构建数据时报错

2023-12-05 Thread lee

退订

> 2023年12月5日 19:33，李甜彪  写道：
> 
> 构建时报错，数据在hive中是没有问题的，空数据构建时可以成功，反思有可能是数据问题，自己手写几条数据，构建时又同样的错误，证明不是原来的数据的问题。
> 页面的看到的报错信息如下：
> java.io.IOException: OS command error exit with return code: 1, error 
> message: che.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
> ... 3 more
> 
> }
> RetryInfo{
>overrideConf : {},
>throwable : java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.CubeBuildJob
> at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:96)
> at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 74.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 74.0 (TID 186) (store2 executor 20): java.lang.NoClassDefFoundError: 
> Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:103)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
> at 
> org.apache.spark.sql.hive.HadoopTableReader.$anonfun$makeRDDForTable$3(TableReader.scala:136)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)
> at 
>

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Xiaoxiang Yu

A JIRA ticket has been opened, waiting for INFRA :
https://issues.apache.org/jira/browse/INFRA-25238 .

With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang, please update me when you have changed your default
> branch. In case people are impressed by the numbers then I hope to turn
> this situation to reverse direction.
>
> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu  wrote:
>
>> The default branch is for 4.X which is a maintained branch, the active
>> branch is kylin5.
>> I will change the default branch to kylin5 later.
>>
>> 
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy  wrote:
>>
>>> Hi Xiaoxiang, Sirs / Madams
>>>
>>> Can you see the atttached photo
>>>
>>> My boss asked that why druid commit code regularly but kylin had not
>>> been committed since July
>>>
>>>
>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu  wrote:
>>>
 I think so.

 Response time is not the only factor to make a decision. Kylin could be
 cheaper
 when the query pattern is suitable for the Kylin model, and Kylin can
 guarantee
 reasonable query latency. Clickhouse will be quicker in an ad hoc query
 scenario.

 By the way, Youzan and Kyligence combine them together to provide
 unified data analytics services for their customers.

 
 With warm regard
 Xiaoxiang Yu



 On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy 
 wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase
> of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud
> computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===
> >
> > Kylin is suitable for aggregation queries with fixed modes because
> of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume
> is, the
> > more obvious the advantages of using Kylin are; in particular, Kylin
> is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in
> de-weighting
> > (count distinct), Top N, Percentile and other scenarios are
> especially
> > huge, and it is used in a large number of scenarios, such as
> Dashboard, all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions
> of
> > queries per day, and most of the queries can be completed within 2 -
> 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing
> power and
> > is more suitable when the query request is more flexible, or when
> there is
> > a need for detailed queries with low concurrency. Scenarios include:
> very
> > many columns and where conditions are arbitrarily combined with the
> user
> > label filtering, not a large amount of concurrency of complex
> on-the-spot
> > query and so on. If the amount of data and access is large, you need
> to
> > deploy a distributed ClickHouse cluster, which is a higher challenge
> for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries
> is
> > small, even if each query consumes a lot of computational resources,
> it is
> > still cost-effective overall. If some queries have a fixed pattern
> and the
> > query volume is large, it is more suitable for Kylin, because the
> query
> > volume is large, and by using large computational resources to save
> the
> > results, the upfront computational cost can be amortized over each
> query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Xiaoxiang Yu

I think so.

Response time is not the only factor to make a decision. Kylin could be
cheaper
when the query pattern is suitable for the Kylin model, and Kylin can
guarantee
reasonable query latency. Clickhouse will be quicker in an ad hoc query
scenario.

By the way, Youzan and Kyligence combine them together to provide
unified data analytics services for their customers.


With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy  wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===
> >
> > Kylin is suitable for aggregation queries with fixed modes because of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume is,
> the
> > more obvious the advantages of using Kylin are; in particular, Kylin is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> > (count distinct), Top N, Percentile and other scenarios are especially
> > huge, and it is used in a large number of scenarios, such as Dashboard,
> all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions of
> > queries per day, and most of the queries can be completed within 2 - 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing power and
> > is more suitable when the query request is more flexible, or when there
> is
> > a need for detailed queries with low concurrency. Scenarios include: very
> > many columns and where conditions are arbitrarily combined with the user
> > label filtering, not a large amount of concurrency of complex on-the-spot
> > query and so on. If the amount of data and access is large, you need to
> > deploy a distributed ClickHouse cluster, which is a higher challenge for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries is
> > small, even if each query consumes a lot of computational resources, it
> is
> > still cost-effective overall. If some queries have a fixed pattern and
> the
> > query volume is large, it is more suitable for Kylin, because the query
> > volume is large, and by using large computational resources to save the
> > results, the upfront computational cost can be amortized over each query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's
> >> great.
> >>
> >> This morning there has been a new challenge to my team: clickhouse
> offered
> >> us the speed of calculating 8 billion rows in millisecond which is
> faster
> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
> >> 2.9
> >> seconds)
> >>
> >> Can you briefly suggest the advantages of kylin over clickhouse so that
> I
> >> can defend my demonstration.
> >>
> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
> >>
> >> > 1. "In this important scenario of realtime analytics, the reason here
> is
> >> > that
> >> > kylin has lag time due to model update of new segment build, is that
> >> > correct?"
> >> >
> >> > You are correct.
> >> >
> >> > 2. "If that is true, then can you suggest a work-around of combination
> >> of
> >> > ... "
> >> >
> >> > Kylin is planning to introduce NRT streaming(coding is completed but
> not
> >> > released),
> >> > which can make the time-lag to about 3 minutes(that is my estimation
> >> but I
> >> > am
> >> > quite certain about it).
> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
> >> > aggregation and persistence periodically. The price is that you need
> to
> >> run
> >> > and monitor a long-running
> >> >  job. This feature is based on Spark Streaming, so you need knowledge
> of
> >> > it.
> >> >
> >> > I am curious about what is the maximum time-lag your customers
> >> > can tolerate?
> >> > Personally, I guess minute level time-lag is ok for

Re: Pinot/Kylin/Druid quick comparision

2023-12-04 Thread Nam Đỗ Duy via user

Hi Xiaoxiang, thank you

In case my client uses cloud computing service like gcp or aws, which
will cost more: precalculation feature of kylin or clickhouse (incase of
kylin, I have a thought that the query execution has been done once and
stored in cube to be used many times so kylin uses less cloud computation,
is that true)?

On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu  wrote:

> Following text is part of an article(
> https://zhuanlan.zhihu.com/p/343394287) .
>
>
> ===
>
> Kylin is suitable for aggregation queries with fixed modes because of its
> pre-calculated technology, for example, join, group by, and where condition
> modes in SQL are relatively fixed, etc. The larger the data volume is, the
> more obvious the advantages of using Kylin are; in particular, Kylin is
> particularly advantageous in the scenarios of de-emphasis (count distinct),
> Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> (count distinct), Top N, Percentile and other scenarios are especially
> huge, and it is used in a large number of scenarios, such as Dashboard, all
> kinds of reports, large-screen display, traffic statistics, and user
> behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
> their data service platforms, providing millions to tens of millions of
> queries per day, and most of the queries can be completed within 2 - 3
> seconds. There is no better alternative for such a high concurrency
> scenario.
>
> ClickHouse, because of its MPP architecture, has high computing power and
> is more suitable when the query request is more flexible, or when there is
> a need for detailed queries with low concurrency. Scenarios include: very
> many columns and where conditions are arbitrarily combined with the user
> label filtering, not a large amount of concurrency of complex on-the-spot
> query and so on. If the amount of data and access is large, you need to
> deploy a distributed ClickHouse cluster, which is a higher challenge for
> operation and maintenance.
>
> If some queries are very flexible but infrequent, it is more
> resource-efficient to use now-computing. Since the number of queries is
> small, even if each query consumes a lot of computational resources, it is
> still cost-effective overall. If some queries have a fixed pattern and the
> query volume is large, it is more suitable for Kylin, because the query
> volume is large, and by using large computational resources to save the
> results, the upfront computational cost can be amortized over each query,
> so it is the most economical.
>
> --- Translated with DeepL.com (free version)
>
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy  wrote:
>
>> Thank you Xiaoxiang for the near real time streaming feature. That's
>> great.
>>
>> This morning there has been a new challenge to my team: clickhouse offered
>> us the speed of calculating 8 billion rows in millisecond which is faster
>> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> 2.9
>> seconds)
>>
>> Can you briefly suggest the advantages of kylin over clickhouse so that I
>> can defend my demonstration.
>>
>> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
>>
>> > 1. "In this important scenario of realtime analytics, the reason here is
>> > that
>> > kylin has lag time due to model update of new segment build, is that
>> > correct?"
>> >
>> > You are correct.
>> >
>> > 2. "If that is true, then can you suggest a work-around of combination
>> of
>> > ... "
>> >
>> > Kylin is planning to introduce NRT streaming(coding is completed but not
>> > released),
>> > which can make the time-lag to about 3 minutes(that is my estimation
>> but I
>> > am
>> > quite certain about it).
>> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> > aggregation and persistence periodically. The price is that you need to
>> run
>> > and monitor a long-running
>> >  job. This feature is based on Spark Streaming, so you need knowledge of
>> > it.
>> >
>> > I am curious about what is the maximum time-lag your customers
>> > can tolerate?
>> > Personally, I guess minute level time-lag is ok for most cases.
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy 
>> wrote:
>> >
>> > > Druid is better in
>> > > - Have a real-time datasource like Kafka etc.
>> > >
>> > > ==
>> > >
>> > > Hi Xiaoxiang, thank you for your response.
>> > >
>> > > In this important scenario of realtime alalytics, the reason here is
>> that
>> > > kylin has lag time due to model update of new segment build, is that
>> > > correct?
>> > >
>> > > If that is true, then can you suggest a work-around of combination of
>> :
>> > >
>> > > (time - lag kylin cube) + (realtime DB update) to provide
>> > > realtime capability ?

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Xiaoxiang Yu

Following text is part of an article(https://zhuanlan.zhihu.com/p/343394287)
.

===

Kylin is suitable for aggregation queries with fixed modes because of its
pre-calculated technology, for example, join, group by, and where condition
modes in SQL are relatively fixed, etc. The larger the data volume is, the
more obvious the advantages of using Kylin are; in particular, Kylin is
particularly advantageous in the scenarios of de-emphasis (count distinct),
Top N, and Percentile. In particular, Kylin's advantages in de-weighting
(count distinct), Top N, Percentile and other scenarios are especially
huge, and it is used in a large number of scenarios, such as Dashboard, all
kinds of reports, large-screen display, traffic statistics, and user
behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
their data service platforms, providing millions to tens of millions of
queries per day, and most of the queries can be completed within 2 - 3
seconds. There is no better alternative for such a high concurrency
scenario.

ClickHouse, because of its MPP architecture, has high computing power and
is more suitable when the query request is more flexible, or when there is
a need for detailed queries with low concurrency. Scenarios include: very
many columns and where conditions are arbitrarily combined with the user
label filtering, not a large amount of concurrency of complex on-the-spot
query and so on. If the amount of data and access is large, you need to
deploy a distributed ClickHouse cluster, which is a higher challenge for
operation and maintenance.

If some queries are very flexible but infrequent, it is more
resource-efficient to use now-computing. Since the number of queries is
small, even if each query consumes a lot of computational resources, it is
still cost-effective overall. If some queries have a fixed pattern and the
query volume is large, it is more suitable for Kylin, because the query
volume is large, and by using large computational resources to save the
results, the upfront computational cost can be amortized over each query,
so it is the most economical.

--- Translated with DeepL.com (free version)

With warm regard
Xiaoxiang Yu

On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang for the near real time streaming feature. That's great.
>
> This morning there has been a new challenge to my team: clickhouse offered
> us the speed of calculating 8 billion rows in millisecond which is faster
> than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
> seconds)
>
> Can you briefly suggest the advantages of kylin over clickhouse so that I
> can defend my demonstration.
>
> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:
>
> > 1. "In this important scenario of realtime analytics, the reason here is
> > that
> > kylin has lag time due to model update of new segment build, is that
> > correct?"
> >
> > You are correct.
> >
> > 2. "If that is true, then can you suggest a work-around of combination of
> > ... "
> >
> > Kylin is planning to introduce NRT streaming(coding is completed but not
> > released),
> > which can make the time-lag to about 3 minutes(that is my estimation but
> I
> > am
> > quite certain about it).
> > NRT stands for 'near real-time', it will run a job and do micro-batch
> > aggregation and persistence periodically. The price is that you need to
> run
> > and monitor a long-running
> >  job. This feature is based on Spark Streaming, so you need knowledge of
> > it.
> >
> > I am curious about what is the maximum time-lag your customers
> > can tolerate?
> > Personally, I guess minute level time-lag is ok for most cases.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy 
> wrote:
> >
> > > Druid is better in
> > > - Have a real-time datasource like Kafka etc.
> > >
> > > ==
> > >
> > > Hi Xiaoxiang, thank you for your response.
> > >
> > > In this important scenario of realtime alalytics, the reason here is
> that
> > > kylin has lag time due to model update of new segment build, is that
> > > correct?
> > >
> > > If that is true, then can you suggest a work-around of combination of :
> > >
> > > (time - lag kylin cube) + (realtime DB update) to provide
> > > realtime capability ?
> > >
> > > IMO, the point here is to find that (realtime DB update) and integrate
> it
> > > with (time - lag kylin cube).
> > >
> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
> > >
> > > > I researched and tested Druid two years ago(I don't know too much
> about
> > > >  the change of Druid in these two years. New features that I know
> are :
> > > > new UI, fully on K8s etc).
> > > >
> > > > Here are some cases you should consider using Druid other than Kylin
> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Nam Đỗ Duy via user

Thank you Xiaoxiang for the near real time streaming feature. That's great.

This morning there has been a new challenge to my team: clickhouse offered
us the speed of calculating 8 billion rows in millisecond which is faster
than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
seconds)

Can you briefly suggest the advantages of kylin over clickhouse so that I
can defend my demonstration.

On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu  wrote:

> 1. "In this important scenario of realtime analytics, the reason here is
> that
> kylin has lag time due to model update of new segment build, is that
> correct?"
>
> You are correct.
>
> 2. "If that is true, then can you suggest a work-around of combination of
> ... "
>
> Kylin is planning to introduce NRT streaming(coding is completed but not
> released),
> which can make the time-lag to about 3 minutes(that is my estimation but I
> am
> quite certain about it).
> NRT stands for 'near real-time', it will run a job and do micro-batch
> aggregation and persistence periodically. The price is that you need to run
> and monitor a long-running
>  job. This feature is based on Spark Streaming, so you need knowledge of
> it.
>
> I am curious about what is the maximum time-lag your customers
> can tolerate?
> Personally, I guess minute level time-lag is ok for most cases.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy  wrote:
>
> > Druid is better in
> > - Have a real-time datasource like Kafka etc.
> >
> > ==
> >
> > Hi Xiaoxiang, thank you for your response.
> >
> > In this important scenario of realtime alalytics, the reason here is that
> > kylin has lag time due to model update of new segment build, is that
> > correct?
> >
> > If that is true, then can you suggest a work-around of combination of :
> >
> > (time - lag kylin cube) + (realtime DB update) to provide
> > realtime capability ?
> >
> > IMO, the point here is to find that (realtime DB update) and integrate it
> > with (time - lag kylin cube).
> >
> > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
> >
> > > I researched and tested Druid two years ago(I don't know too much about
> > >  the change of Druid in these two years. New features that I know are :
> > > new UI, fully on K8s etc).
> > >
> > > Here are some cases you should consider using Druid other than Kylin
> > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> two
> > > years ago):
> > >
> > > - Have a real-time datasource like Kafka etc.
> > > - Most queries are small(Based on my test result, I think Druid had
> > better
> > > response time for small queries two years ago.)
> > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > >   cloud platform as your deployment platform.
> > >
> > > But I do think there are many scenarios in which Kylin could be better,
> > > like:
> > >
> > > - Better performance for complex/big queries. Kylin can have a more
> > > exact-match/fine-grained
> > >   Index for queries containing different `Group By dimensions`.
> > > - User-friendly UI for modeling.
> > > - Support 'Join' better? (Not sure at the moment)
> > > - ODBC driver for different BI.(its website did not show it supports
> ODBC
> > > well)
> > > - Looks like Kylin supports ANSI SQL better than Druid.
> > >
> > >
> > > I don't know Pinot, so I have nothing to say about it.
> > > Hope to help you, or you are free to share your opinion.
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy 
> > wrote:
> > >
> > >> Dear Xiaoxiang,
> > >> Sirs/Madams,
> > >>
> > >> May I post my boss's question:
> > >>
> > >> What are the pros and cons of the OLAP platform Kylin compared to
> Pinot
> > >> and
> > >> Druid?
> > >>
> > >> Please kindly let me know
> > >>
> > >> Thank you very much and best regards
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Xiaoxiang Yu

1. "In this important scenario of realtime analytics, the reason here is
that
kylin has lag time due to model update of new segment build, is that
correct?"

You are correct.

2. "If that is true, then can you suggest a work-around of combination of
... "

Kylin is planning to introduce NRT streaming(coding is completed but not
released),
which can make the time-lag to about 3 minutes(that is my estimation but I
am
quite certain about it).
NRT stands for 'near real-time', it will run a job and do micro-batch
aggregation and persistence periodically. The price is that you need to run
and monitor a long-running
 job. This feature is based on Spark Streaming, so you need knowledge of it.

I am curious about what is the maximum time-lag your customers
can tolerate?
Personally, I guess minute level time-lag is ok for most cases.


With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy  wrote:

> Druid is better in
> - Have a real-time datasource like Kafka etc.
>
> ==
>
> Hi Xiaoxiang, thank you for your response.
>
> In this important scenario of realtime alalytics, the reason here is that
> kylin has lag time due to model update of new segment build, is that
> correct?
>
> If that is true, then can you suggest a work-around of combination of :
>
> (time - lag kylin cube) + (realtime DB update) to provide
> realtime capability ?
>
> IMO, the point here is to find that (realtime DB update) and integrate it
> with (time - lag kylin cube).
>
> On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:
>
> > I researched and tested Druid two years ago(I don't know too much about
> >  the change of Druid in these two years. New features that I know are :
> > new UI, fully on K8s etc).
> >
> > Here are some cases you should consider using Druid other than Kylin
> > at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> > years ago):
> >
> > - Have a real-time datasource like Kafka etc.
> > - Most queries are small(Based on my test result, I think Druid had
> better
> > response time for small queries two years ago.)
> > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> >   cloud platform as your deployment platform.
> >
> > But I do think there are many scenarios in which Kylin could be better,
> > like:
> >
> > - Better performance for complex/big queries. Kylin can have a more
> > exact-match/fine-grained
> >   Index for queries containing different `Group By dimensions`.
> > - User-friendly UI for modeling.
> > - Support 'Join' better? (Not sure at the moment)
> > - ODBC driver for different BI.(its website did not show it supports ODBC
> > well)
> > - Looks like Kylin supports ANSI SQL better than Druid.
> >
> >
> > I don't know Pinot, so I have nothing to say about it.
> > Hope to help you, or you are free to share your opinion.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy 
> wrote:
> >
> >> Dear Xiaoxiang,
> >> Sirs/Madams,
> >>
> >> May I post my boss's question:
> >>
> >> What are the pros and cons of the OLAP platform Kylin compared to Pinot
> >> and
> >> Druid?
> >>
> >> Please kindly let me know
> >>
> >> Thank you very much and best regards
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

2023-12-03 Thread Nam Đỗ Duy via user

Druid is better in
- Have a real-time datasource like Kafka etc.

==

Hi Xiaoxiang, thank you for your response.

In this important scenario of realtime alalytics, the reason here is that
kylin has lag time due to model update of new segment build, is that
correct?

If that is true, then can you suggest a work-around of combination of :

(time - lag kylin cube) + (realtime DB update) to provide
realtime capability ?

IMO, the point here is to find that (realtime DB update) and integrate it
with (time - lag kylin cube).

On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu  wrote:

> I researched and tested Druid two years ago(I don't know too much about
>  the change of Druid in these two years. New features that I know are :
> new UI, fully on K8s etc).
>
> Here are some cases you should consider using Druid other than Kylin
> at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> years ago):
>
> - Have a real-time datasource like Kafka etc.
> - Most queries are small(Based on my test result, I think Druid had better
> response time for small queries two years ago.)
> - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
>   cloud platform as your deployment platform.
>
> But I do think there are many scenarios in which Kylin could be better,
> like:
>
> - Better performance for complex/big queries. Kylin can have a more
> exact-match/fine-grained
>   Index for queries containing different `Group By dimensions`.
> - User-friendly UI for modeling.
> - Support 'Join' better? (Not sure at the moment)
> - ODBC driver for different BI.(its website did not show it supports ODBC
> well)
> - Looks like Kylin supports ANSI SQL better than Druid.
>
>
> I don't know Pinot, so I have nothing to say about it.
> Hope to help you, or you are free to share your opinion.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy  wrote:
>
>> Dear Xiaoxiang,
>> Sirs/Madams,
>>
>> May I post my boss's question:
>>
>> What are the pros and cons of the OLAP platform Kylin compared to Pinot
>> and
>> Druid?
>>
>> Please kindly let me know
>>
>> Thank you very much and best regards
>>
>

Re: Pinot/Kylin/Druid quick comparision

2023-11-30 Thread Xiaoxiang Yu

I researched and tested Druid two years ago(I don't know too much about
 the change of Druid in these two years. New features that I know are : new
UI, fully on K8s etc).

Here are some cases you should consider using Druid other than Kylin
at the moment (using Kylin 5.0-beta to compare the Druid which I used two
years ago):

- Have a real-time datasource like Kafka etc.
- Most queries are small(Based on my test result, I think Druid had better
response time for small queries two years ago.)
- Don't know how to optimize Spark/Hadoop, want to use the K8S/public
  cloud platform as your deployment platform.

But I do think there are many scenarios in which Kylin could be better,
like:

- Better performance for complex/big queries. Kylin can have a more
exact-match/fine-grained
  Index for queries containing different `Group By dimensions`.
- User-friendly UI for modeling.
- Support 'Join' better? (Not sure at the moment)
- ODBC driver for different BI.(its website did not show it supports ODBC
well)
- Looks like Kylin supports ANSI SQL better than Druid.


I don't know Pinot, so I have nothing to say about it.
Hope to help you, or you are free to share your opinion.


With warm regard
Xiaoxiang Yu



On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy  wrote:

> Dear Xiaoxiang,
> Sirs/Madams,
>
> May I post my boss's question:
>
> What are the pros and cons of the OLAP platform Kylin compared to Pinot and
> Druid?
>
> Please kindly let me know
>
> Thank you very much and best regards
>

Re: How to use measure of Kylin in query

2023-11-22 Thread Xiaoxiang Yu

Your summary
"I write query normally to query the desired column and Kylin uses the
index mechanism to accelerate query"
 is almost right. And I cannot understand exactly what your question is.




With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 5:32 PM Nam Đỗ Duy  wrote:

> Dear Sir / Madam
>
> I've searched the web but cannot find the way to use measures of Kylin, for
> example, with this  quote from the URL of document, it seems that the
> measure's magic is as follows: "I write query normally to query the desired
> colummn and Kylin uses the index mechanism to accelerate query", can you
> please advise?
>
> Count Distinct (Precise) | Welcome to Kylin 5 (apache.org)
> <
> https://kylin.apache.org/5.0/docs/modeling/model_design/measure_design/count_distinct_bitmap
> >
>
> Once the measure is added and the model is saved, you need to go to the
> Edit
> Aggregate Index page, add the corresponding dimensions and measures to the
> appropriate aggregate group according to your business scenario, and the
> new aggregate index will be generated after submission. You need to build
> index and load data to complete the precomputation of the target column.
> You can check the job of Build Index in the Job Monitor page. After the
> index is built, you can use the Count Distinct (Precise) measure to do some
> querying.
>

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu

Yes, you are right.

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 17:57:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang, 

1. For my question of near real time data: this scenario is not about querying 
the cube (index), I am mentioning the query against the Hive table only: is 
that possible to instantly querying the non_cube data if the data is already in 
Hive?

Best regards

On Mon, Nov 13, 2023 at 4:23 PM Xiaoxiang Yu  wrote:

1.  Query them instantly is not possible, you need to trigger a build job and 
wait it completed, it will cost about 5-30 mintues in most cases. So 
the delay caused by Kylin is 5-30 minites.

2. DS/AI can send SQL query using Python and get the result(if kylinpy works 
well), just like you do in Kylin insight window.

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 17:09:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive 
table, can that near realtime data be queried in Kylin insights windows by SQL 
query almost instantly? If not then how can I reflect near realtime data in 
(Kylin insights Window as well as in PowerBI report which connect to Kylin via 
mez)?

2. For previous question 2, if DS/AI team cannot access Kylin parquet file via 
java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 

Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.

If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.

```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise

Thank you very much

On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam

I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:

1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects

Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user

Thank you Xiaoxiang,

1. For my question of near real time data: this scenario is not about
querying the cube (index), I am mentioning the query against the Hive table
only: is that possible to instantly querying the non_cube data if the data
is already in Hive?

Best regards

On Mon, Nov 13, 2023 at 4:23 PM Xiaoxiang Yu  wrote:

> 1.  Query them instantly is not possible, you need to trigger a build job
> and wait it completed, it will cost about 5-30 mintues in most cases. So
> the delay caused by Kylin is 5-30 minites.
>
> 2. DS/AI can send SQL query using Python and get the result(if kylinpy
> works well), just like you do in Kylin insight window.
>
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-11-13 17:09:59, "Nam Đỗ Duy via user" 
> wrote:
>
> Thank you Xiaoxiang for answering my previous question
>
> 1. For previous question 1, if I can ingest data near real-time into Hive
> table, can that near realtime data be queried in Kylin insights windows by
> SQL query almost instantly? If not then how can I reflect near
> realtime data in (Kylin insights Window as well as in PowerBI report which
> connect to Kylin via mez)?
>
> 2. For previous question 2, if DS/AI team cannot access Kylin parquet file
> via java/python/scala then can they:
>
> 2.1) access the Hive Star schema table?
> 2.2) access kylin cube via API?
> 2.3) access computed fields of kylin cube via API
> 2.4 access kylin model's  measures via API
>
> Thank you very much
>
> On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:
>
>> Hi,
>> Question 1:
>> You are almost right.
>> If the Cube not ready, Kylin will use SparkSQL to execute query directly
>> on original tables.
>>
>> Question 2:
>> It is possible but very hard.
>> The index data are saved in Parquet format, it is possible to read them
>> by Spark, but the columns' name are encoded
>>  so you don't understand which columns are useful to you. The mapping
>> from parquet files'
>> columns to Model's dimensions or measures is stored Kylin's metastore, so
>> the knowledge of Kylin source code
>> is required to make good use of model/index files when reading them
>> directly.
>>
>> If we have a Python library(like
>> https://github.com/Kyligence/kylinpy/tree/master) which provide
>>  the ability that you can send SQL to Kylin. Will it be helpful to your
>> Data science team?
>> Following is an example.
>>
>>
>> ```
>>  >>> import sqlalchemy as sa
>>  >>> import pandas as pd
>>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
>> /learn_kylin?timeout=60_debug=1')
>>  >>> sql = 'select * from kylin_sales limit 10'
>>  >>> pd.read_sql(sql, kylin_engine)
>>
>> ```
>>
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" 
>> wrote:
>>
>> Hi Xiaoxiang,
>>
>> Basically you can imagine the scenario that there will be3 teams who will
>> be using Kylin's Cube:
>>
>> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez),
>> superset to access kylin Cube as well.
>> b) Data science team (DS) who is using Pyspark, SparkML currently
>> assessing HDFS and parquet directly as raw file.
>> c) AI team who is using various interfaces like Java, Python, Scala to
>> assess HDFS and parquet directly as raw file.
>>
>> I have two questions:
>>
>> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
>> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
>> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
>> how can Hive/Kylin provide more merits to these teams? For this question, I
>> imagine of OLAP speed or computed metrics etc but I am not sure so please
>> advise
>>
>> Thank you very much
>>
>>
>>
>>
>> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:
>>
>>> Do you have any specific business scenario? Looks like there is
>>> not such real usecase at the moment.
>>>
>>>
>>>
>>> --
>>> *Best wishes to you ! *
>>> *From ：**Xiaoxiang Yu*
>>>
>>>
>>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
>>> wrote:
>>>
>>> Dear Sir/Madam
>>>
>>> I am persuading my company to use kylin as olap platform so please
>>> kindly share with me (inbox me if you hesitate to share publicly) your real
>>> use-cases to help me answer our boss’s question:
>>>
>>> 1. Which companies are using kylin now
>>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>>
>>> Thank you very much for your valuable time and support
>>>
>>>

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu

1.  Query them instantly is not possible, you need to trigger a build job and 
wait it completed, it will cost about 5-30 mintues in most cases. So 
the delay caused by Kylin is 5-30 minites.

2. DS/AI can send SQL query using Python and get the result(if kylinpy works 
well), just like you do in Kylin insight window.

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 17:09:59, "Nam Đỗ Duy via user"  wrote:

Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive 
table, can that near realtime data be queried in Kylin insights windows by SQL 
query almost instantly? If not then how can I reflect near realtime data in 
(Kylin insights Window as well as in PowerBI report which connect to Kylin via 
mez)?

2. For previous question 2, if DS/AI team cannot access Kylin parquet file via 
java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 

Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.

If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.

```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise

Thank you very much

On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam

I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:

1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects

Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user

Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive
table, can that near realtime data be queried in Kylin insights windows by
SQL query almost instantly? If not then how can I reflect near
realtime data in (Kylin insights Window as well as in PowerBI report which
connect to Kylin via mez)?

2. For previous question 2, if DS/AI team cannot access Kylin parquet file
via java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu  wrote:

> Hi,
> Question 1:
> You are almost right.
> If the Cube not ready, Kylin will use SparkSQL to execute query directly
> on original tables.
>
> Question 2:
> It is possible but very hard.
> The index data are saved in Parquet format, it is possible to read them by
> Spark, but the columns' name are encoded
>  so you don't understand which columns are useful to you. The mapping from
> parquet files'
> columns to Model's dimensions or measures is stored Kylin's metastore, so
> the knowledge of Kylin source code
> is required to make good use of model/index files when reading them
> directly.
>
> If we have a Python library(like
> https://github.com/Kyligence/kylinpy/tree/master) which provide
>  the ability that you can send SQL to Kylin. Will it be helpful to your
> Data science team?
> Following is an example.
>
>
> ```
>  >>> import sqlalchemy as sa
>  >>> import pandas as pd
>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
> /learn_kylin?timeout=60_debug=1')
>  >>> sql = 'select * from kylin_sales limit 10'
>  >>> pd.read_sql(sql, kylin_engine)
>
> ```
>
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" 
> wrote:
>
> Hi Xiaoxiang,
>
> Basically you can imagine the scenario that there will be3 teams who will
> be using Kylin's Cube:
>
> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
> to access kylin Cube as well.
> b) Data science team (DS) who is using Pyspark, SparkML currently
> assessing HDFS and parquet directly as raw file.
> c) AI team who is using various interfaces like Java, Python, Scala to
> assess HDFS and parquet directly as raw file.
>
> I have two questions:
>
> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
> how can Hive/Kylin provide more merits to these teams? For this question, I
> imagine of OLAP speed or computed metrics etc but I am not sure so please
> advise
>
> Thank you very much
>
>
>
>
> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:
>
>> Do you have any specific business scenario? Looks like there is
>> not such real usecase at the moment.
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
>> wrote:
>>
>> Dear Sir/Madam
>>
>> I am persuading my company to use kylin as olap platform so please kindly
>> share with me (inbox me if you hesitate to share publicly) your real
>> use-cases to help me answer our boss’s question:
>>
>> 1. Which companies are using kylin now
>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>
>> Thank you very much for your valuable time and support
>>
>>

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Xiaoxiang Yu

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 

Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.

If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.

```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 16:02:20, "Nam Đỗ Duy via user"  wrote:

Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise

Thank you very much

On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 11:36:35, "Nam Đỗ Duy via user"  wrote:

Dear Sir/Madam

I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:

1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects

Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

2023-11-13 Thread Nam Đỗ Duy via user

Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will
be using Kylin's Cube:

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
to access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to
assess HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not
ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how
can Hive/Kylin provide more merits to these teams? For this question, I
imagine of OLAP speed or computed metrics etc but I am not sure so please
advise

Thank you very much

On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu  wrote:

> Do you have any specific business scenario? Looks like there is
> not such real usecase at the moment.
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" 
> wrote:
>
> Dear Sir/Madam
>
> I am persuading my company to use kylin as olap platform so please kindly
> share with me (inbox me if you hesitate to share publicly) your real
> use-cases to help me answer our boss’s question:
>
> 1. Which companies are using kylin now
> 2. How do you use kylin’s capabilities in your AI/ML projects
>
> Thank you very much for your valuable time and support
>
>

Re: 如何设置树形层级结构维度

2023-10-25 Thread ShaoFeng Shi

It is not supported I think.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




雨后初晴 <745579...@qq.com> 于2023年10月25日周三 00:49写道：

>
> 有个“组织”维度，其层级是不定的。结构类似于：orgId,parentOrgId,orgName,isLeaf。分别表示组织id、上级组织id、组织名称、是否末级组织，一个组织只有一个直接上级组织。
> 事实表中只有末级组织id的数据，但查询需要任意一层级组织的数据。kylin如何定义设置这类维度的？
>

Re: 退订

2023-10-12 Thread lee

退订

> 2023年10月11日 13:07，许颖众  写道：
> 
>  
>  
> 发件人: hit_la...@126.com <mailto:hit_la...@126.com>  <mailto:hit_la...@126.com>> 代表 Xiaoxiang Yu
> 发送时间: 2023年8月2日 17:39
> 收件人: user@kylin.apache.org <mailto:user@kylin.apache.org>; leil...@zqykj.com 
> <mailto:leil...@zqykj.com>; 1163629...@qq.com <mailto:1163629...@qq.com>
> 主题: Re:退订
>  
> Hey, to unsubscribe user mailling list, please follow these steps:
>  
> First, you need to send any text to user-unsubscr...@kylin.apache.org 
> <mailto:user-unsubscr...@kylin.apache.org> . 
> And you will receive a confirm email with title "confirm unsubscribe from 
> user@kylin.apache.org <mailto:user@kylin.apache.org> " in several minutes.
>  
> Then, you have to reply (with any text) to that confirm email to confirm your 
> unsubscribe request. 
> You will receiver a final email with title "GOODBYE from 
> user@kylin.apache.org <mailto:user@kylin.apache.org>", that is to say you 
> finally unsubscribe successfully.
>  
> The offical guide : 
> https://www.apache.org/foundation/mailinglists.html#request-confirmation 
> <https://www.apache.org/foundation/mailinglists.html#request-confirmation> .
>  
> --
> Best wishes to you ! 
> From ：Xiaoxiang Yu
>  
> 在 2023-08-02 16:53:32，"雷利娜" mailto:leil...@zqykj.com>> 写道：
> 
> 退订
> 你好，由于工作原因退订所有相关的邮件。
> 谢谢！
> 免责声明：本邮件所包含信息发给指定个人或机构，邮件可能包含保密或专属信息。未经接收者许可，不得阅读、转发或传播邮件内容，或根据邮件内容采取任何相关行动。如果错误地收到了此邮件，请与收件人联系并自行删除邮件内容。
> 
> Disclaimer：The information transmitted is intended only for the person or 
> entity to which it is addressed and may contain confidential and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error , please contact the sender and delete the material from any 
> computer .
>

Re: Kylin defaut storage system is HDFS?

2023-09-02 Thread marc nicole

Hi Yu,

This link *https://kylin.apache.org/docs31/tutorial/setup_jdbc_datasource.html
*
suggests that "*Since v2.3.0 Apache Kylin starts to support JDBC as the
third type of data source (after Hive, Kafka)"*

So My question of if I can have MySQL as an  alternative to Hive has
positive answer according to the link above, or am I wrong?

Le lun. 28 août 2023 à 04:48, Xiaoxiang Yu  a écrit :

> Hi,
> For Kylin 5, you have to use a distributed storage, and the default
> choice is HDFS,
> and alternative choice is Cloud Storage(like S3), you can NOT deploy and
> run Kylin
> without a distributed storage.
> Besides, you need a RDBMS as a metastore, Zookeeper as service
> discovery,
> a Spark cluster as compute service, a Hive Metastore for seeking
> databases and tables.
> Finally, HBae is totally NOT necessary for Kylin 4.0 or higher.
>
> For the question 'Could I use Kylin with just MySQL + Sqoop? (no Hive)
> ', the
> answer is no, you need to install and deployed Zookeeper, a distributed
> storage
> (HDFS or cloud storage), a Spark cluster and a Hive metastore. Here is a
> diagram
> maybe helpful:
> https://kylin.apache.org/images/blog/kylin4_on_cloud/3_kylin_cluster.jpg
>
> Here are some links:
> - https://kylin.apache.org/blog/2022/04/20/kylin4-on-cloud-part1/
> -
> https://kylin.apache.org/5.0/docs/deployment/on-premises/installation/platform/install_on_apache_hadoop
>
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Aug 26, 2023 at 8:03 PM marc nicole  wrote:
>
>> Hello,
>>
>> I have few questions regarding storage mean for Kylin:
>>
>> I was wondering if Kylin would work normally if I don't configure it to
>> work with any storage tool (as MySQL with Sqoop Or with Hive)? It would
>> then automatically use HDFS ?
>>
>> Also is configuring HBASE necessary?
>>
>> Could I use Kylin with just MySQL + Sqoop? (no Hive)
>> What the use of HBase if the normal used storage is Hive?
>>
>> Thanks. Regards
>>
>

Re: Kylin defaut storage system is HDFS?

2023-08-27 Thread Xiaoxiang Yu

Hi,
For Kylin 5, you have to use a distributed storage, and the default
choice is HDFS,
and alternative choice is Cloud Storage(like S3), you can NOT deploy and
run Kylin
without a distributed storage.
Besides, you need a RDBMS as a metastore, Zookeeper as service
discovery,
a Spark cluster as compute service, a Hive Metastore for seeking
databases and tables.
Finally, HBae is totally NOT necessary for Kylin 4.0 or higher.

For the question 'Could I use Kylin with just MySQL + Sqoop? (no Hive)
', the
answer is no, you need to install and deployed Zookeeper, a distributed
storage
(HDFS or cloud storage), a Spark cluster and a Hive metastore. Here is a
diagram
maybe helpful:
https://kylin.apache.org/images/blog/kylin4_on_cloud/3_kylin_cluster.jpg

Here are some links:
- https://kylin.apache.org/blog/2022/04/20/kylin4-on-cloud-part1/
-
https://kylin.apache.org/5.0/docs/deployment/on-premises/installation/platform/install_on_apache_hadoop

With warm regard
Xiaoxiang Yu

On Sat, Aug 26, 2023 at 8:03 PM marc nicole  wrote:

> Hello,
>
> I have few questions regarding storage mean for Kylin:
>
> I was wondering if Kylin would work normally if I don't configure it to
> work with any storage tool (as MySQL with Sqoop Or with Hive)? It would
> then automatically use HDFS ?
>
> Also is configuring HBASE necessary?
>
> Could I use Kylin with just MySQL + Sqoop? (no Hive)
> What the use of HBase if the normal used storage is Hive?
>
> Thanks. Regards
>

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread P.F. ZHAN

Do you mean extract data from the fact table? Define a  model based on this
table, and dimensions & this measure, then create an aggregation group with
these dimensions and measure, build to get the data you preferred. Or do
you mean the metadata file? Just query the database or dump metadata to
check what you need.

On Wed, Aug 2, 2023 at 20:22 marc nicole  wrote:

> the dataset (csv) is in this form
>
> att1
>
> att2
>
> count_measure
>  str1
> str2
> int1
> ...
> How to extract from it fact table (the count_measure column) and the
> dimensions ? in the datasource / model definition in Kylin?
>
> Le mer. 2 août 2023 à 13:32, marc nicole  a écrit :
>
>> in the model the measure definition is as follows (I don't understand
>> chinese)
>>
>> {
>>   "name": "COUNT1",
>>   "function": {
>> "expression": "COUNT",
>> "parameter": {
>>   "type": "column",
>>   "value": "attributeName.COUNT1"
>> },
>> "returntype": "bigint"
>>   }
>> },
>>
>>
>> Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :
>>
>>> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
>>> expected answer?
>>> If you have checked, I think you can give us you cube
>>> defination(CubeDesc in Json) and the SQL statement you queried, so we can
>>> discuss in detail?
>>>
>>>
>>>
>>> --
>>> *Best wishes to you ! *
>>> *From ：**Xiaoxiang Yu*
>>>
>>>
>>> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>>>
>>> The measure is of type column (not constant) and is bigint. I selected
>>> the measure from the dropdown corretly as well. measure column returns 1
>>> for all column values instead of actual values when querying the cube. What
>>> could be the underlying problem? cube or model defining? or maybe data
>>> source attribute types?
>>>
>>> Maybe I should create a lookup table with the fact table (which I am not
>>> doing so far)?
>>>
>>> Why the measure column in query answer is showing only 1 as values ??
>>>
>>>
>>>
>>>

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread marc nicole

the dataset (csv) is in this form

att1

att2

count_measure
 str1
str2
int1
...
How to extract from it fact table (the count_measure column) and the
dimensions ? in the datasource / model definition in Kylin?

Le mer. 2 août 2023 à 13:32, marc nicole  a écrit :

> in the model the measure definition is as follows (I don't understand
> chinese)
>
> {
>   "name": "COUNT1",
>   "function": {
> "expression": "COUNT",
> "parameter": {
>   "type": "column",
>   "value": "attributeName.COUNT1"
> },
> "returntype": "bigint"
>   }
> },
>
>
> Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :
>
>> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
>> expected answer?
>> If you have checked, I think you can give us you cube defination(CubeDesc
>> in Json) and the SQL statement you queried, so we can discuss in detail?
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>>
>> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>>
>> The measure is of type column (not constant) and is bigint. I selected
>> the measure from the dropdown corretly as well. measure column returns 1
>> for all column values instead of actual values when querying the cube. What
>> could be the underlying problem? cube or model defining? or maybe data
>> source attribute types?
>>
>> Maybe I should create a lookup table with the fact table (which I am not
>> doing so far)?
>>
>> Why the measure column in query answer is showing only 1 as values ??
>>
>>
>>
>>

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread marc nicole

in the model the measure definition is as follows (I don't understand
chinese)

{
  "name": "COUNT1",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "column",
  "value": "attributeName.COUNT1"
},
"returntype": "bigint"
  }
},


Le mer. 2 août 2023 à 05:31, Xiaoxiang Yu  a écrit :

> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
> expected answer?
> If you have checked, I think you can give us you cube defination(CubeDesc
> in Json) and the SQL statement you queried, so we can discuss in detail?
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>
> The measure is of type column (not constant) and is bigint. I selected the
> measure from the dropdown corretly as well. measure column returns 1 for
> all column values instead of actual values when querying the cube. What
> could be the underlying problem? cube or model defining? or maybe data
> source attribute types?
>
> Maybe I should create a lookup table with the fact table (which I am not
> doing so far)?
>
> Why the measure column in query answer is showing only 1 as values ??
>
>
>
>

Re:退订

2023-08-02 Thread Xiaoxiang Yu

Hey, to unsubscribe user mailling list, please follow these steps:


First, you need to send any text to user-unsubscr...@kylin.apache.org . 
And you will receive a confirm email with title "confirm unsubscribe from 
user@kylin.apache.org " in several minutes.


Then, you have to reply (with any text) to that confirm email to confirm your 
unsubscribe request. 
You will receiver a final email with title "GOODBYE from 
user@kylin.apache.org", that is to say you finally unsubscribe successfully.


The offical guide : 
https://www.apache.org/foundation/mailinglists.html#request-confirmation .




--

Best wishes to you ! 
From ：Xiaoxiang Yu




在 2023-08-02 16:53:32，"雷利娜"  写道：

退订

你好，由于工作原因退订所有相关的邮件。
谢谢！

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread Xiaoxiang Yu

I guessed Shaofeng has answered your question in wechat group. Following is his 
answer:




把维度表中的字段，做成“normal”维度，这样这些字段的数据，就会被持久化到cube中；构建新的segment的时候，也不会改变过去构建的结果了
如果不是“normal”维度，就会存在于维度表的快照中，所以维度表快照被刷新后数据也就变了


请问你为什么不考虑使用一般维度呢（因为设置为一般维度后，就不会生成维度表快照了）？







--

Best wishes to you ! 
From ：Xiaoxiang Yu




At 2023-08-02 15:45:57, "李甜彪"  wrote:

问题就是，维度数据每构建一次就需要变动一次，导致了前面构建的也用了最新的维度数据，但是构建时间跨天就不会有这种问题，还是维度快照的更新机制问题，怎么能解决维度数据变了，前面已经构建过的关联的维表还是当时的那种状态呢？


| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | P.F. ZHAN |
| Date | 8/2/2023 15:34 |
| To |  |
| Subject | Re: measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
这种你不是应该将需要查询的列设置成维度，然后预计算存储到cube么？cube的维度数据，如果不做刷新，那么就不会变化。




On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

大神，能帮我解决一下我碰到的问题吗？维度表的快照怎么能不让取最新的，在kylin4.0.3的使用过程中，在同一天构建不同日期分区数据时，维度表需要切换，但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表，现在的想法是每次构建关联维度表的数据在变，但是也要让当天已经构建过的数据不发生变化，有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From ：Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread 李甜彪

问题就是，维度数据每构建一次就需要变动一次，导致了前面构建的也用了最新的维度数据，但是构建时间跨天就不会有这种问题，还是维度快照的更新机制问题，怎么能解决维度数据变了，前面已经构建过的关联的维表还是当时的那种状态呢？


| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | P.F. ZHAN |
| Date | 8/2/2023 15:34 |
| To |  |
| Subject | Re: measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
这种你不是应该将需要查询的列设置成维度，然后预计算存储到cube么？cube的维度数据，如果不做刷新，那么就不会变化。




On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

大神，能帮我解决一下我碰到的问题吗？维度表的快照怎么能不让取最新的，在kylin4.0.3的使用过程中，在同一天构建不同日期分区数据时，维度表需要切换，但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表，现在的想法是每次构建关联维度表的数据在变，但是也要让当天已经构建过的数据不发生变化，有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From ：Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-02 Thread P.F. ZHAN

这种你不是应该将需要查询的列设置成维度，然后预计算存储到cube么？cube的维度数据，如果不做刷新，那么就不会变化。


On Wed, Aug 2, 2023 at 11:35 李甜彪  wrote:

> 大神，能帮我解决一下我碰到的问题吗？维度表的快照怎么能不让取最新的，
> 在kylin4.0.3的使用过程中，在同一天构建不同日期分区数据时，维度表需要切换，但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表，现在的想法是每次构建关联维度表的数据在变，但是也要让当天已经构建过的数据不发生变化，有什么方法可以实现。
> 李甜彪
> ltb1...@163.com
>
> 
>  Replied Message 
> From Xiaoxiang Yu 
> Date 8/2/2023 11:31
> To  ,
>   
> Subject Re:measure column showing 1 as values instead of the actual
> values in Kylin SQL Query answer table
> Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the
> expected answer?
> If you have checked, I think you can give us you cube defination(CubeDesc
> in Json) and the SQL statement you queried, so we can discuss in detail?
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-08-02 04:35:04, "marc nicole"  wrote:
>
> The measure is of type column (not constant) and is bigint. I selected the
> measure from the dropdown corretly as well. measure column returns 1 for
> all column values instead of actual values when querying the cube. What
> could be the underlying problem? cube or model defining? or maybe data
> source attribute types?
>
> Maybe I should create a lookup table with the fact table (which I am not
> doing so far)?
>
> Why the measure column in query answer is showing only 1 as values ??
>
>
>
>

Re: measure column showing 1 as values instead of the actual values in Kylin SQL Query answer table

2023-08-01 Thread 李甜彪

大神，能帮我解决一下我碰到的问题吗？维度表的快照怎么能不让取最新的，在kylin4.0.3的使用过程中，在同一天构建不同日期分区数据时，维度表需要切换，但是当天的最后一次构建会导致今天构建的其他天数据也使用了最后一次所关联的维度表，现在的想法是每次构建关联维度表的数据在变，但是也要让当天已经构建过的数据不发生变化，有什么方法可以实现。
| |
李甜彪
|
|
ltb1...@163.com
|
 Replied Message 
| From | Xiaoxiang Yu |
| Date | 8/2/2023 11:31 |
| To |  ,
 |
| Subject | Re:measure column showing 1 as values instead of the actual values 
in Kylin SQL Query answer table |
Do you check the answer by Hive/SparkSQL, do Hive/SparkSQL give the expected 
answer?
If you have checked, I think you can give us you cube defination(CubeDesc in 
Json) and the SQL statement you queried, so we can discuss in detail? 







--

Best wishes to you ! 
From ：Xiaoxiang Yu




At 2023-08-02 04:35:04, "marc nicole"  wrote:

The measure is of type column (not constant) and is bigint. I selected the 
measure from the dropdown corretly as well. measure column returns 1 for all 
column values instead of actual values when querying the cube. What could be 
the underlying problem? cube or model defining? or maybe data source attribute 
types?

Maybe I should create a lookup table with the fact table (which I am not doing 
so far)?


Why the measure column in query answer is showing only 1 as values ??

Re: Problems encountered during the use of kylin 4.0.3

2023-07-21 Thread ShaoFeng Shi

1. Did you configure some BI tool or monitoring with Kylin, which send the
"select 1" query every second? Kylin itself won't do that .

2. If you deleted the test0718 project, any request to that project will
get the "Cannot find project" error. This might be the same issue as the
first issue.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




杨冠军 <15563907...@163.com> 于2023年7月21日周五 14:12写道：

> Hello
>
> Recently, I encountered the following two issues in using Kylin 4.0.3,
> which are difficult to solve and require your Q:
>
> 1. A test topic test0718 was created in Kylin, and data was constructed on
> this topic and queried. A log of - [QUERY] - will be printed every second
> in Kylin.log, with SQL in query being select 1; Execute once per second;
>
> 2. After deleting the test topic test0718 in kylin, an error is reported
> every ten minutes in kylin.log. The error is: controller. BasicController:
> 98:
>
> Org. apache. kylin. test. exception. BadRequestException: Cannot find
> project 'test0718';
>
> May I ask how the above two situations were triggered? Thank you~
>

Re: Cannot sync Hive partitioned table,Cannot get Hive TableMeta,i want to ask if can solve this Bug in Kylin 3.1.3 hadoop version?

2023-07-18 Thread ShaoFeng Shi

Replied in JIRA, also copy here:

If you can provide the error log in kylin's backend, that would help. I
think Hive 3.1 might be too new, because Kylin 3.0 is compiled with Hive
1.1; Maybe you can try to build with your Hive version:
https://github.com/apache/kylin/blob/kylin-3.0.1/pom.xml

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Zhaoyi Huang (NSB)  于2023年6月11日周日 10:36写道：

> Hi,
>
> I want to ask if can solve this Bug in Kylin 3.1.3 hadoop version?
>
>
>
> [KYLIN-4883] Cannot sync Hive partitioned table - Ooops.. Cannot get Hive
> TableMeta - ASF JIRA (apache.org)
> 
>
>
>
>- *Environment:*
>
> Redhat 7.4
> hadoop 3.1.1
> hbase 2.2.6
> hive 3.1.1
> kylin 3.1.3
>
> Kafka 2.0
>
>
>
>
>

Re: Drill-Across two cubes sharing one common dimension

2023-06-09 Thread marc nicole

Just to add: I want to accomplish that, in the Development env (no Hadoop)
under Windows. Is it possible?

Le ven. 9 juin 2023 à 22:23, marc nicole  a écrit :

> Hello guys,
>
> I want to ask if it is possible to do a drill-across operation on two
> cubes based in Kylin that share one dimension in common and have two
> different fact tables?
>
> How to do that in Python?
>
> Thanks.
>
>

Re: Kylin 4.2 - cleanup not working

2023-05-23 Thread ShaoFeng Shi

Hello,

It seems that message should be a warning, instead of error:
https://github.com/apache/kylin/blob/main/server-base/src/main/java/org/apache/kylin/rest/job/StorageCleanupJob.java#L230

For each cube, it checks whether the path
"/parquet/" exists or not. If exists, it will
further check its subfolders (corresponding to each segment). If not, it
just gives a warning.

Hope it helps.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Singh Sonu  于2023年5月15日周一 14:14写道：

> Hi Experts,
>
> Any help or suggestions will be appreciated.
>
> I am facing an issue while cleaning up unused parquet files in HDFS.
> command: bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true
>
> Error: job.StorageCleanup Job:222: Cube path doesn't exist! The path is
> file:/apps/kylin/kylin_metadata/project_a/parquet/cube1
>
> During full cube build or incremental, kylin is not removing unwanted or
> unused segments from parquet folder under hdfs.
>
>
>
>  You can reach me out at
>  Email- sonusingh.javat...@gmail.com
>
>  with regards,
>  Sonu Kumar Singh
>

Re: MDX interface for EXCEL from Docker image gets login failed

2023-04-19 Thread ShaoFeng Shi

Good to know you solved it by checking the document :-)

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Łukasz Stefański SoSimple  于2023年4月13日周四
01:29写道：

> Problem solved with tutorial:
> https://kylin.apache.org/docs/tutorial/quick_start_for_mdx.html
>
>
>
>
>
> *From:* Łukasz Stefański SoSimple 
> *Sent:* Wednesday, April 12, 2023 4:07 PM
> *To:* user@kylin.apache.org
> *Subject:* MDX interface for EXCEL from Docker image gets login failed
>
>
>
> Hi
>
> I am using docker kylin for now. But after buidling the cube I discovered
> that it is not possible to connect to CUBE from EXCEL or POWERBI.
>
> I am getting login failed for user ADMIN/KYLIN.
>
>
>
> Bellow logs shows login try from excel :
>
>
>
> File mdx.log :
>
>
>
> 2023-04-07 04:02:59,273 [WARN ] [Query
> f3f4b028-1bde-4c2d-e1a6-e1d0d219d3c1] i.k.m.i.s.f.MdxServiceFilter.doFilter
> - [MDX-04010001] please add auth info in your request.
>
> 2023-04-07 04:02:59,331 [INFO ] [Query
> 5202bd61-6de4-5925-12c9-c43ccb98a3a6]
> i.k.m.w.x.MdxXmlaServlet.prepareMondrianSchema - begin init datasource,
> username=ANALYST, project=learn_kylin
>
> 2023-04-07 04:02:59,331 [ERROR] [Query
> 5202bd61-6de4-5925-12c9-c43ccb98a3a6] i.k.m.i.s.f.MdxServiceFilter.doFilter
> - internal error
>
> io.kylin.mdx.insight.common.SemanticException: The connection user
> information or password maybe empty or has been changed, please contact
> system admin to update in Configuration page under Management.
>
> at
> io.kylin.mdx.insight.core.support.SemanticFacade.getSemanticProjectByUser(SemanticFacade.java:62)
> ~[semantic-core-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.core.service.ModelManager.buildMondrianSchemaFromDataSet(ModelManager.java:70)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.XmlaDatasource.loadMdnSchemas(XmlaDatasource.java:108)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.XmlaDatasource.initDatasource(XmlaDatasource.java:90)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.MdxXmlaServlet.prepareMondrianSchema(MdxXmlaServlet.java:216)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.MdxXmlaServlet.process(MdxXmlaServlet.java:103)
> ~[mdx-1.2.0.jar!/:?]
>
> at mondrian.xmla.XmlaServlet.doPost(XmlaServlet.java:119)
> ~[olap4j-xmlaserver-1.2.0.jar!/:?]
>
>
>
> Can you please help to fix this docker setup ?
>
>
>
> Łukasz
>

RE: MDX interface for EXCEL from Docker image gets login failed

2023-04-12 Thread Łukasz Stefański SoSimple

Problem solved with tutorial:  
 
https://kylin.apache.org/docs/tutorial/quick_start_for_mdx.html

 

 

From: Łukasz Stefański SoSimple  
Sent: Wednesday, April 12, 2023 4:07 PM
To: user@kylin.apache.org
Subject: MDX interface for EXCEL from Docker image gets login failed 

 

Hi 

I am using docker kylin for now. But after buidling the cube I discovered that 
it is not possible to connect to CUBE from EXCEL or POWERBI. 

I am getting login failed for user ADMIN/KYLIN. 

 

Bellow logs shows login try from excel :

 

File mdx.log : 

 

2023-04-07 04:02:59,273 [WARN ] [Query f3f4b028-1bde-4c2d-e1a6-e1d0d219d3c1] 
i.k.m.i.s.f.MdxServiceFilter.doFilter - [MDX-04010001] please add auth info in 
your request.

2023-04-07 04:02:59,331 [INFO ] [Query 5202bd61-6de4-5925-12c9-c43ccb98a3a6] 
i.k.m.w.x.MdxXmlaServlet.prepareMondrianSchema - begin init datasource, 
username=ANALYST, project=learn_kylin

2023-04-07 04:02:59,331 [ERROR] [Query 5202bd61-6de4-5925-12c9-c43ccb98a3a6] 
i.k.m.i.s.f.MdxServiceFilter.doFilter - internal error

io.kylin.mdx.insight.common.SemanticException: The connection user information 
or password maybe empty or has been changed, please contact system admin to 
update in Configuration page under Management.

at 
io.kylin.mdx.insight.core.support.SemanticFacade.getSemanticProjectByUser(SemanticFacade.java:62)
 ~[semantic-core-1.2.0.jar!/:?]

at 
io.kylin.mdx.core.service.ModelManager.buildMondrianSchemaFromDataSet(ModelManager.java:70)
 ~[mdx-1.2.0.jar!/:?]

at 
io.kylin.mdx.web.xmla.XmlaDatasource.loadMdnSchemas(XmlaDatasource.java:108) 
~[mdx-1.2.0.jar!/:?]

at 
io.kylin.mdx.web.xmla.XmlaDatasource.initDatasource(XmlaDatasource.java:90) 
~[mdx-1.2.0.jar!/:?]

at 
io.kylin.mdx.web.xmla.MdxXmlaServlet.prepareMondrianSchema(MdxXmlaServlet.java:216)
 ~[mdx-1.2.0.jar!/:?]

at 
io.kylin.mdx.web.xmla.MdxXmlaServlet.process(MdxXmlaServlet.java:103) 
~[mdx-1.2.0.jar!/:?]

at mondrian.xmla.XmlaServlet.doPost(XmlaServlet.java:119) 
~[olap4j-xmlaserver-1.2.0.jar!/:?]

 

Can you please help to fix this docker setup ? 

 

Łukasz

Re: Question about setup Kylin4.0 in CDH

2023-03-23 Thread ShaoFeng Shi

Hi Lea,

Yes we have user run Kylin 4 on CDP 7, but I'm not sure whether it is
exactly the same version as yours. As I remember there is some jar conflict
(servlet or jsp related) which need some manual work, if you can provide
the detail log message you got, that would be great.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Chu, Lea  于2023年3月8日周三 15:42写道：

> Hi all users in Kylin,
>
>
>
> I’m Lea from Taiwan Garmin. I’m beginner in Kylin and would like to setup
> Kylin in our Hadoop cluster.
>
> I saw Kylin version 4.0 passes the tests on Cloudera CDH 6.3.2 in
> Installation Guide. (https://kylin.apache.org/docs/install/index.html)
> Unfortunately, our Hadoop cluster is built on CDH7.1.7. So I would like to
> know if other users have set up Kylin4.0 and test the availability of CDH
> 7.X. Thank you
>
>
>
> Regards,
>
> Lea
>

Re: Kylin 特性

2023-03-23 Thread ShaoFeng Shi

Hello Renjie,

I didn't get the question clearly; If you can provide some detail
information such as a sample, that would be helpful for other people to
answer.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




王仁杰  于2023年2月13日周一 16:48写道：

> 想问一下，Kylin 目前有提供父子维递归查询的解决方案吗
>

Re: Kylin Compatibility issue

2023-02-20 Thread ShaoFeng Shi

Hello Ibar,

I just replied it in the JIRA. For such problem it is better to discuss it
first in the mailing list than in JIRA, because JIRA is mainly for
feature/bug/task management. It shows that you didn't subscribe the mailing
list so your email was blocked. I manually approved that. To proceed,
please finishe the subscribing.

About how to subscribe, please check this as an example (replace "inlong"
with the name of the project you want to subscribe):
https://inlong.apache.org/community/how-to-subscribe/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Ibrar Ahmed  于2023年2月21日周二 10:34写道：

> Hi Community,
> please have a look at the following JIRA ticket:
> https://issues.apache.org/jira/browse/KYLIN-5453.
> and update on the ticket.
>
> Regards:
> Ibrar Ahmed
> --
>
> Thanks!!
>
>
>
> Ibrar Ahmed | Staff. Data Engineer
>
> *10Pearls*
>
> Digital Innovation & Acceleration Partner
>
> www.10pearls.com
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.10pearls.com_=DwMFAg=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=jFWw2L_g3h_qJJ-yyJa08qXKXrjXr2SsE9I5tgGbpm0=84_nBJclJ2_UsFE2Qbo2LEl7C9t_wyH8UH9QK78ZB_8=7yFR9TI8yv2LXFzheIMiSbyzbB6w4mTZaf1RSfcnH4k=
> >
>
>
> Wash DC | San Fran | London | Karachi | Dubai | Medellin
>
> *EY Entrepreneur of the Year Finalist (CEO)*
>
> *Inc. 5000*
>

Re: Does Kylin 5.0 do not supported JDBC data source, only old version can supported?

2022-12-13 Thread WangYong

More question:  We turn back to Kylin 4.0 version, but found WEB UI only 
support load tree data from HIVE, how can we load data from sql server or mysql 
in 4.0 version?

发件人:  代表 Xiaoxiang Yu 
答复: 
日期: 2022年12月13日 星期二 下午6:51
收件人: 
抄送: 
主题: Re:Does Kylin 5.0 do not supported JDBC data source, only old version can 
supported?

>From source 
>code(https://github.com/apache/kylin/tree/kylin5/src/datasource-sdk), we can 
>see JDBC datasource is still supported in backend and it is still under 
>development, but it is not show in Web UI at the moment. JDBC source is a high 
>priority task, but if more users are interested in it, community developers 
>may increase its priority.

--

Best wishes to you ! 

>From ：Xiaoxiang Yu

At 2022-12-13 11:34:13, "WangYong"  wrote:

Does Kylin 5.0 do not supported JDBC data source, only old version can 
supported?

-Refrence---

In 5.0 Documents：

Question: Besides Hive data source, what other data sources does Kylin support?

Answer: Kylin current only supports Hive.

In 2.0 Documents：

Setup JDBC Data Source

Available since Apache Kylin v2.3.x

Supported JDBC data source

Since v2.3.0 Apache Kylin starts to support JDBC as the third type of data 
source (after Hive, Kafka). User can integrate Kylin with their SQL database or 
data warehouses like MySQL, Microsoft SQL Server and HP Vertica directly. Other 
relational databases are easy to support as well.

Re: Does Kylin 5.0 do not supported JDBC data source, only old version can supported?

2022-12-13 Thread WangYong

Good news, thanks.

This is an important feature, because JDBC datasource is widely used in 
entrepirses. Waiting in hope this web ui versin,and the updated docker version.

发件人:  代表 Xiaoxiang Yu 
答复: 
日期: 2022年12月13日 星期二 下午6:51
收件人: 
抄送: 
主题: Re:Does Kylin 5.0 do not supported JDBC data source, only old version can 
supported?

>From source 
>code(https://github.com/apache/kylin/tree/kylin5/src/datasource-sdk), we can 
>see JDBC datasource is still supported in backend and it is still under 
>development, but it is not show in Web UI at the moment. JDBC source is a high 
>priority task, but if more users are interested in it, community developers 
>may increase its priority.

--

Best wishes to you ! 

>From ：Xiaoxiang Yu

At 2022-12-13 11:34:13, "WangYong"  wrote:

Does Kylin 5.0 do not supported JDBC data source, only old version can 
supported?

-Refrence---

In 5.0 Documents：

Question: Besides Hive data source, what other data sources does Kylin support?

Answer: Kylin current only supports Hive.

In 2.0 Documents：

Setup JDBC Data Source

Available since Apache Kylin v2.3.x

Supported JDBC data source

Since v2.3.0 Apache Kylin starts to support JDBC as the third type of data 
source (after Hive, Kafka). User can integrate Kylin with their SQL database or 
data warehouses like MySQL, Microsoft SQL Server and HP Vertica directly. Other 
relational databases are easy to support as well.

Re: KYLIN自定义聚合函数

2022-11-27 Thread Xiaoxiang Yu

I think this video tutorial may be helpful.


https://www.bilibili.com/video/BV1Wt4y1s7rZ

--

Best wishes to you ! 
From ：Xiaoxiang Yu




在 2022-11-27 16:56:02，"chinacsj"  写道：

能否出一点KYLIN使用的视频教程？






|

chinacsj

15180809...@139.com

15180809092

电子名片新出VIP模板啦，快来体验>>
|

扫一扫,

快速添加名片到手机

|
-- 原始邮件 --
发件人: ShaoFeng Shi  ;
发送时间: 2022-11-24 17:20:07
收件人:user 
抄送:(无) <>
主题: Re: KYLIN自定义聚合函数
Hi, it need some development. You can refer to this folder for measure 
aggregator:
https://github.com/apache/kylin/tree/main/core-metadata/src/main/java/org/apache/kylin/measure


To make it appear in the web page, also need to modify the front-end codes.



Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org









朱杨坤  于2022年11月21日周一 10:25写道：

请问：KYLIN 3.X版本怎么添加自定义聚合函数  ， 并出现在度量 的选择项中？

Re: KYLIN自定义聚合函数

2022-11-24 Thread ShaoFeng Shi

Hi, it need some development. You can refer to this folder for measure
aggregator:
https://github.com/apache/kylin/tree/main/core-metadata/src/main/java/org/apache/kylin/measure

To make it appear in the web page, also need to modify the front-end codes.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




朱杨坤  于2022年11月21日周一 10:25写道：

> 请问：KYLIN 3.X版本怎么添加自定义聚合函数  ， 并出现在度量 的选择项中？
>

Re: [DISCUSS] Move to Spark 3 totally in Kylin 4

2022-08-22 Thread ShaoFeng Shi

Thanks for yang's comment. We will move to Spark 3 from next release, which
will be Kylin 4.0.2.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yang Li  于2022年8月11日周四 17:36写道：

> +1
>
> Spark 2 is out of maintenance. Due to security concerns, we should
> encourage all Kylin users to move away from Spark 2 and onboard Spark 3 for
> data safety.
>
> The best signal for this purpose is stopping releases that are known to
> contain security vulnerabilities.
>
> Regards
> Yang
>
> From: ShaoFeng Shi mailto:shaofeng...@apache.org>>
> Sent: Wednesday, August 10, 2022 10:37 AM
> To: dev mailto:d...@kylin.apache.org>>; user <
> user@kylin.apache.org>
> Subject: [DISCUSS] Move to Spark 3 totally in Kylin 4
>
> Hello Kylin community,
>
> As you know, Kylin 4.0 supports both Spark 2.4 and Spark 3.1 at the very
> begining; Recently when we try to fix some security vulnerabilities (e.g,
> CVE-2022-22978), we
> found that Spark 2 is hard to be compitable with recommended version of
> Spring-core and Spring security.
>
> Besides, we noticed that the latest Spark 2 release v2.4.8 was released on
> May 17, 2021, which is almost 15 months ago. Which means it is not actively
> maintained anymore.
>
> So,  I propose Kylin 4 move to Spark 3 totally, and will not release
> package for Spark 2 anymore. For the legacy users, please upgrade your
> Spark.
>
> Your comments are welcomed.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC,
> Apache Incubator PMC,
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org user-subscr...@kylin.apache.org>
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org dev-subscr...@kylin.apache.org>
>
>
>

Re: presto连接KYLIN

2022-06-10 Thread ShaoFeng Shi

Hi yangkun,

I'm curious about your scenario; are you trying to use Kylin as a source in
Presto, or use Presto as a data source in Kylin (just like Hive)?

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Mukvin  于2022年6月10日周五 14:30写道：

> Hi,
> Current Kylin doesn't have the presto connector
>
>
> --
> Best regards.
> Tengting Xu
>
>
> 在 2022-06-09 14:36:24，"朱杨坤"  写道：
>
> 请问有presto连接kylin的驱动包吗？或者开发demo?
>
>

Re:回复: File does not exist ....... fairscheduler.xml

2022-05-21 Thread Mukvin

Hi Gao Michael,


Spark 3.2.1 has differences from Spark 3.1.1. If you want to run on Spark 3.2.1 
maybe after Kylin adapted it.
You can run Spark 3.2.1 and Spark 3.1.1 together, but two sparks will separate 
the resource of the cluster.
And current the package of "kylin-4.0.1-bin-spark3.tar.gz" can only run on 
Spark 3.1.1.




--

Best regards.
Tengting Xu




在 2022-05-20 23:24:13，"Gao Michael"  写道：

Hi Tengting Xu

Thanks for your help, it works!

I download spark3.2.1 and modify SPARK_HOME=`path in .bashrc to spark 3.1.1`,  
now I can import hive tables via kylin web interface.

So is there any way to keep spark3.2.1 run on cluster and just keep kylin run 
on spark3.1.1?

 

Thanks!

 

 

发件人: Mukvin 
发送时间: 2022年5月20日 9:22
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

 

Hi Gao Michael,

I mean you can export SPARK_HOME=`path to spark 3.1.1` for KYLIN, do not need 
to downgrade 3.2.1.

You can keep them both.

--

Best regards.

Tengting Xu

 

At 2022-05-19 22:47:14, "Gao Michael"  wrote:

Hi Tengting Xu:

I already deployed spark3.2.1 on my cluster, you mean I should degrade it to 
3.1.1?

 

 

发件人: Mukvin 
发送时间: 2022年5月19日 22:36
收件人:michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

 

Hi Gao Michael,

As I checked the log file, I found two errors in it. 

"controller.TableController:131 : Failed to load Hive Table 
java.lang.StackOverflowError" and "File not found exception".

They are related to spark.

As you have mentioned your env is with spark 3.2.1, but currently, 
kylin-4.0.1-bin-spark3.tar.gz only supports spark 3.1.1 as Kylin's official 
site mentioned, so you may need to download spark 3.1.1 and export the path to 
be `spark home`. Then you can try to restart Kylin again and check.

 

--

Best regards.

Tengting Xu

 

At 2022-05-19 22:15:15, "Gao Michael"  wrote:

Hi Mukin

The log file too long and I only  can only send pats of it as an attachment.

I import table via kylin web interface, and watch log file by command “tail -f”,

The log file watch terminal was rolling continuous in the process of import.

Attachment cover a complete loading process

 

Thanks

 

发件人: Mukvin 
发送时间: 2022年5月19日 21:28
收件人:michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: File does not exist ... fairscheduler.xml

 

 

Hi Gao Michael, 

Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.

Let me get the full trace of this error.

 

--

Best regards.

Tengting Xu

 

At 2022-05-19 19:41:20, "Gao Michael"  wrote:

Hi all,

Who can Help?

 

发件人: Gao Michael 
发送时间: 2022年5月18日 19:23
收件人:user@kylin.apache.org
主题: File does not exist ... fairscheduler.xml

 

Hi all

 

I got follow error when I try to add table to my kylin project via web 
interface:

 

2022-05-1818:48:37,904INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040

2022-05-1818:48:38,681ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools

java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

 

and I’m sure file exist!

 

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$ 
ls conf/

fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties

kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh

kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

 

What’s the problem and how to solve? Thanks!

回复: Re:回复: Re:回复: Re:回复: File does not exist ....... fairscheduler.xml

2022-05-20 Thread Gao Michael

Hi Tengting Xu
Thanks for your help, it works!
I download spark3.2.1 and modify SPARK_HOME=`path in .bashrc to spark 3.1.1`,  
now I can import hive tables via kylin web interface.
So is there any way to keep spark3.2.1 run on cluster and just keep kylin run 
on spark3.1.1?

Thanks!


发件人: Mukvin 
发送时间: 2022年5月20日 9:22
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

Hi Gao Michael,
I mean you can export SPARK_HOME=`path to spark 3.1.1` for KYLIN, do not need 
to downgrade 3.2.1.

You can keep them both.

--
Best regards.
Tengting Xu



At 2022-05-19 22:47:14, "Gao Michael" 
mailto:michael.ga...@hotmail.com>> wrote:
Hi Tengting Xu:
I already deployed spark3.2.1 on my cluster, you mean I should degrade it to 
3.1.1?


发件人: Mukvin mailto:boyboys...@163.com>>
发送时间: 2022年5月19日 22:36
收件人: michael.ga...@hotmail.com<mailto:michael.ga...@hotmail.com>; 
user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

Hi Gao Michael,
As I checked the log file, I found two errors in it.
"controller.TableController:131 : Failed to load Hive Table 
java.lang.StackOverflowError" and "File not found exception".
They are related to spark.
As you have mentioned your env is with spark 3.2.1, but currently, 
kylin-4.0.1-bin-spark3.tar.gz only supports spark 3.1.1 as Kylin's official 
site mentioned, so you may need to download spark 3.1.1 and export the path to 
be `spark home`. Then you can try to restart Kylin again and check.



--
Best regards.
Tengting Xu



At 2022-05-19 22:15:15, "Gao Michael" 
mailto:michael.ga...@hotmail.com>> wrote:
Hi Mukin
The log file too long and I only  can only send pats of it as an attachment.
I import table via kylin web interface, and watch log file by command “tail -f”,
The log file watch terminal was rolling continuous in the process of import.
Attachment cover a complete loading process

Thanks

发件人: Mukvin mailto:boyboys...@163.com>>
发送时间: 2022年5月19日 21:28
收件人: michael.ga...@hotmail.com<mailto:michael.ga...@hotmail.com>; 
user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: Re:回复: File does not exist ... fairscheduler.xml




Hi Gao Michael,
Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.
Let me get the full trace of this error.



--
Best regards.
Tengting Xu



At 2022-05-19 19:41:20, "Gao Michael" 
mailto:michael.ga...@hotmail.com>> wrote:
Hi all,
Who can Help?

发件人: Gao Michael mailto:michael.ga...@hotmail.com>>
发送时间: 2022年5月18日 19:23
收件人: user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: File does not exist ... fairscheduler.xml

Hi all

I got follow error when I try to add table to my kylin project via web 
interface:

2022-05-18 18:48:37,904 INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040
2022-05-18 18:48:38,681 ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools
java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

and I’m sure file exist!

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$<mailto:michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$>
 ls conf/
fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties
kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh
kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

What’s the problem and how to solve? Thanks!

Re:回复: Re:回复: Re:回复: File does not exist ....... fairscheduler.xml

2022-05-19 Thread Mukvin

Hi Gao Michael,
I mean you can export SPARK_HOME=`path to spark 3.1.1` for KYLIN, do not need 
to downgrade 3.2.1.

You can keep them both.

--

Best regards.
Tengting Xu




At 2022-05-19 22:47:14, "Gao Michael"  wrote:

Hi Tengting Xu:

I already deployed spark3.2.1 on my cluster, you mean I should degrade it to 
3.1.1?

 

 

发件人: Mukvin 
发送时间: 2022年5月19日 22:36
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

 

Hi Gao Michael,

As I checked the log file, I found two errors in it. 

"controller.TableController:131 : Failed to load Hive Table 
java.lang.StackOverflowError" and "File not found exception".

They are related to spark.

As you have mentioned your env is with spark 3.2.1, but currently, 
kylin-4.0.1-bin-spark3.tar.gz only supports spark 3.1.1 as Kylin's official 
site mentioned, so you may need to download spark 3.1.1 and export the path to 
be `spark home`. Then you can try to restart Kylin again and check.

 

--

Best regards.

Tengting Xu

 

At 2022-05-19 22:15:15, "Gao Michael"  wrote:

Hi Mukin

The log file too long and I only  can only send pats of it as an attachment.

I import table via kylin web interface, and watch log file by command “tail -f”,

The log file watch terminal was rolling continuous in the process of import.

Attachment cover a complete loading process

 

Thanks

 

发件人: Mukvin 
发送时间: 2022年5月19日 21:28
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: File does not exist ... fairscheduler.xml

 

 

Hi Gao Michael, 

Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.

Let me get the full trace of this error.

 

--

Best regards.

Tengting Xu

 

At 2022-05-19 19:41:20, "Gao Michael"  wrote:

Hi all,

Who can Help?

 

发件人: Gao Michael 
发送时间: 2022年5月18日 19:23
收件人:user@kylin.apache.org
主题: File does not exist ... fairscheduler.xml

 

Hi all

 

I got follow error when I try to add table to my kylin project via web 
interface:

 

2022-05-1818:48:37,904INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040

2022-05-1818:48:38,681ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools

java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

 

and I’m sure file exist!

 

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$ 
ls conf/

fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties

kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh

kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

 

What’s the problem and how to solve? Thanks!

回复: Re:回复: Re:回复: File does not exist ....... fairscheduler.xml

2022-05-19 Thread Gao Michael

Hi Tengting Xu:
I already deployed spark3.2.1 on my cluster, you mean I should degrade it to 
3.1.1?


发件人: Mukvin 
发送时间: 2022年5月19日 22:36
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: Re:回复: File does not exist ... fairscheduler.xml

Hi Gao Michael,
As I checked the log file, I found two errors in it.
"controller.TableController:131 : Failed to load Hive Table 
java.lang.StackOverflowError" and "File not found exception".
They are related to spark.
As you have mentioned your env is with spark 3.2.1, but currently, 
kylin-4.0.1-bin-spark3.tar.gz only supports spark 3.1.1 as Kylin's official 
site mentioned, so you may need to download spark 3.1.1 and export the path to 
be `spark home`. Then you can try to restart Kylin again and check.



--
Best regards.
Tengting Xu



At 2022-05-19 22:15:15, "Gao Michael" 
mailto:michael.ga...@hotmail.com>> wrote:
Hi Mukin
The log file too long and I only  can only send pats of it as an attachment.
I import table via kylin web interface, and watch log file by command “tail -f”,
The log file watch terminal was rolling continuous in the process of import.
Attachment cover a complete loading process

Thanks

发件人: Mukvin mailto:boyboys...@163.com>>
发送时间: 2022年5月19日 21:28
收件人: michael.ga...@hotmail.com<mailto:michael.ga...@hotmail.com>; 
user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: Re:回复: File does not exist ... fairscheduler.xml




Hi Gao Michael,
Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.
Let me get the full trace of this error.



--
Best regards.
Tengting Xu



At 2022-05-19 19:41:20, "Gao Michael" 
mailto:michael.ga...@hotmail.com>> wrote:
Hi all,
Who can Help?

发件人: Gao Michael mailto:michael.ga...@hotmail.com>>
发送时间: 2022年5月18日 19:23
收件人: user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: File does not exist ... fairscheduler.xml

Hi all

I got follow error when I try to add table to my kylin project via web 
interface:

2022-05-18 18:48:37,904 INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040
2022-05-18 18:48:38,681 ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools
java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

and I’m sure file exist!

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$<mailto:michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$>
 ls conf/
fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties
kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh
kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

What’s the problem and how to solve? Thanks!

Re:回复: Re:回复: File does not exist ....... fairscheduler.xml

2022-05-19 Thread Mukvin

Hi Gao Michael,
As I checked the log file, I found two errors in it. 
"controller.TableController:131 : Failed to load Hive Table 
java.lang.StackOverflowError" and "File not found exception".
They are related to spark.
As you have mentioned your env is with spark 3.2.1, but currently, 
kylin-4.0.1-bin-spark3.tar.gz only supports spark 3.1.1 as Kylin's official 
site mentioned, so you may need to download spark 3.1.1 and export the path to 
be `spark home`. Then you can try to restart Kylin again and check.




--

Best regards.
Tengting Xu




At 2022-05-19 22:15:15, "Gao Michael"  wrote:

Hi Mukin

The log file too long and I only  can only send pats of it as an attachment.

I import table via kylin web interface, and watch log file by command “tail -f”,

The log file watch terminal was rolling continuous in the process of import.

Attachment cover a complete loading process

 

Thanks

 

发件人: Mukvin 
发送时间: 2022年5月19日 21:28
收件人: michael.ga...@hotmail.com; user@kylin.apache.org
主题: Re:回复: File does not exist ... fairscheduler.xml

 

 

Hi Gao Michael, 

Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.

Let me get the full trace of this error.

 

--

Best regards.

Tengting Xu

 

At 2022-05-19 19:41:20, "Gao Michael"  wrote:

Hi all,

Who can Help?

 

发件人: Gao Michael 
发送时间: 2022年5月18日 19:23
收件人:user@kylin.apache.org
主题: File does not exist ... fairscheduler.xml

 

Hi all

 

I got follow error when I try to add table to my kylin project via web 
interface:

 

2022-05-1818:48:37,904INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040

2022-05-1818:48:38,681ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools

java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

 

and I’m sure file exist!

 

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$ 
ls conf/

fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties

kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh

kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

 

What’s the problem and how to solve? Thanks!

Re:回复: File does not exist ....... fairscheduler.xml

2022-05-19 Thread Mukvin




Hi Gao Michael, 

Could you give a full log about kylin.log located in $KYLIN_HOME/logs/kylin.log.
Let me get the full trace of this error.




--

Best regards.
Tengting Xu




At 2022-05-19 19:41:20, "Gao Michael"  wrote:

Hi all,

Who can Help?

 

发件人: Gao Michael 
发送时间: 2022年5月18日 19:23
收件人: user@kylin.apache.org
主题: File does not exist ... fairscheduler.xml

 

Hi all

 

I got follow error when I try to add table to my kylin project via web 
interface:

 

2022-05-1818:48:37,904INFO  [Thread-7] ui.SparkUI:57 : Bound SparkUI to 
0.0.0.0, and started at http://hadoop-cluster-001:4040

2022-05-1818:48:38,681ERROR [Thread-7] scheduler.FairSchedulableBuilder:94 : 
Error while building the fair scheduler pools

java.io.FileNotFoundException: File does not exist: 
/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/conf/fairscheduler.xml

 

and I’m sure file exist!

 

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$ 
ls conf/

fairscheduler.xml  kylin_job_conf_inmem.xml  kylin.properties   
setenv.sh  spark-executor-log4j.properties

kylin_hive_conf.xmlkylin_job_conf.xml
kylin-server-log4j.properties  setenv-tool.sh

kylin_job_conf_cube_merge.xml  kylin-kafka-consumer.xml  
kylin-tools-log4j.properties   spark-driver-log4j.properties

 

What’s the problem and how to solve? Thanks!

Re:答复: mysql connect string

2022-05-18 Thread Mukvin

Hi,
If you want to unsubscribe from your email, you should email 
user-unsubscr...@kylin.apache.org Or dev-unsubscr...@kylin.apache.org , not 
user@kylin.apache.org.



如果你想退订邮箱，你应该发邮件到 user-unsubscr...@kylin.apache.org 或者 
dev-unsubscr...@kylin.apache.org，而不是 user@kylin.apache.org。







--

Best regards.
Tengting Xu




在 2022-05-18 13:09:23，"许颖众"  写道：

退订

 

发件人: Gao Michael 
发送时间: 2022年5月18日 11:53
收件人: user@kylin.apache.org
主题: mysql connect string

 

|

注意：此邮件来自组织外部，不要轻易点击链接或打开附件。 除非您认识该邮件发送者，并且确认内容是安全的。
CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender, and know the content 
is safe.

|

Hi all

My cluster env is:

Ubuntu server 20.4

Hadoop3.3.2

Hive3.1.2

Hbase2.4.11

Spark3.2.1

mysql  Ver 8.0.29-0ubuntu0.20.04.3 for Linux on x86_64

 

My kylin version is:  kylin-4.0.1-bin-spark3.tar.gz

 

I created user kylin in mysql and create databases both kylin and 
kylin_metastore

 

mysql> show databases;

++

| Database   |

++

| hive_metastore |

| information_schema |

| kylin  |

| kylin_metastore|

| mysql  |

| performance_schema |

| sys|

++

7 rows in set (0.00 sec)

 

I created ext fold:

michael@hadoop-cluster-001:~/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3$ 
ls ext/

mysql-connector-java.jar

 

And I also set kylin.metadata.url in conf/kylin.properties:

kylin.metadata.url=kylin_metadata@jdbc,url=jdbc:mysql://hostname:3306/kylin,username=kylin,password=kylin,maxActive=10,maxIdle=10

 

But when I start kylin, I got error below:

 

2022-05-18 11:37:31,706 INFO  [localhost-startStop-1] 
persistence.JDBCConnectionManager:92 : Connecting to Jdbc with 
url:jdbc:mysql://hostname:3306/kylin by user kylin

2022-05-18 11:37:31,937 WARN  [localhost-startStop-1] 
support.XmlWebApplicationContext:550 : Exception encountered during context 
initialization - cancelling refresh attempt: 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'diagnosisService': Unsatisfied dependency expressed 
through field 'aclEvaluate'; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'aclEvaluate': Unsatisfied dependency expressed through 
field 'aclUtil'; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'aclUtil' defined in URL 
[jar:file:/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/tomcat/webapps/kylin/WEB-INF/lib/kylin-server-base-4.0.1.jar!/org/apache/kylin/rest/util/AclUtil.class]:
 Initialization of bean failed; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'expressionHandler' defined in class path resource 
[kylinSecurity.xml]: Cannot resolve reference to bean 'permissionEvaluator' 
while setting bean property 'permissionEvaluator'; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'permissionEvaluator' defined in class path resource 
[kylinSecurity.xml]: Cannot resolve reference to bean 'aclService' while 
setting constructor argument; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'aclService' defined in URL 
[jar:file:/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/tomcat/webapps/kylin/WEB-INF/lib/kylin-server-base-4.0.1.jar!/org/apache/kylin/rest/service/AclService.class]:
 Instantiation of bean failed; nested exception is 
org.springframework.beans.BeanInstantiationException: Failed to instantiate 
[org.apache.kylin.rest.service.AclService]: Constructor threw exception; nested 
exception is java.lang.IllegalArgumentException: Failed to find metadata store 
by url: 
kylin_metadata@jdbc,url=jdbc:mysql://hostname:3306/kylin,username=kylin,password=kylin,maxActive=10,maxIdle=10

2022-05-18 11:37:31,941 ERROR [localhost-startStop-1] context.ContextLoader:350 
: Context initialization failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'diagnosisService': Unsatisfied dependency expressed 
through field 'aclEvaluate'; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'aclEvaluate': Unsatisfied dependency expressed through 
field 'aclUtil'; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'aclUtil' defined in URL 
[jar:file:/home/michael/opt/modules/kylin/apache-kylin-4.0.1-bin-spark3/tomcat/webapps/kylin/WEB-INF/lib/kylin-server-base-4.0.1.jar!/org/apache/kylin/rest/util/AclUtil.class]:
 Initialization of bean failed; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'expressionHandler'

Re:请求支持，实时消费KAFKA

2022-05-12 Thread Xiaoxiang Yu

Looks like that https://kylin.apache.org/docs31/tutorial/realtime_olap.html has 
provided enough information for you set up a demo. 




--

Best wishes to you ! 
From ：Xiaoxiang Yu




在 2022-05-11 16:29:21，"朱杨坤"  写道：

目前在用KYLIN3.0版本，不知道怎么消费KAKFA。


1、如果您更喜欢以微批量方式摄取 kafka 事件（大约 10 分钟级别的延迟），您可以考虑使用较旧的Near RT 流式传输。

2、如果您更希望实时地摄入和查询到kafka的消息（数据延迟是秒级别）Real-time OLAP  。

由于这两个特性都是针对kafka数据源的，所以不要混用。

我打算选用方式2，但不知道怎么能够消费






 
 
-- Original --
From: "朱杨坤";
Date: 2022年5月11日(星期三) 下午4:26
To: "user";
Subject: 请求支持，实时消费KAFKA
 
请求支持，实时消费KAFKA

Re:请求支持，实时消费KAFKA

2022-05-11 Thread 朱杨坤

目前在用KYLIN3.0版本，不知道怎么消费KAKFA。

1、如果您更喜欢以微批量方式摄取 kafka 事件（大约 10 分钟级别的延迟），您可以考虑使用较旧的Near RT 流式传输 。
 
2、如果您更希望实时地摄入和查询到kafka的消息（数据延迟是秒级别）Real-time OLAP 。
 
由于这两个特性都是针对kafka数据源的，所以不要混用。

我打算选用方式2，但不知道怎么能够消费






--Original--
From: "朱杨坤"; 
Date: 2022年5月11日(星期三) 下午4:26
To: "user"; 
Subject: 请求支持，实时消费KAFKA


请求支持，实时消费KAFKA

Re: Kylin 3.1.3 cube构建卡住

2022-04-20 Thread Yaqian Zhang

Hi：

Maybe you can check if there is any abnormal output in the spark task log?

> 在 2022年4月20日，上午11:53，黄奇  写道：
> 
> kylin cube构建过程中，一直卡顿，集群资源是充足的，请问是什么原因会导致这样
> 
> 
>  
> <1650426555(1).jpg><1650426762(1).png>

RE: Kylin for HDP 3.1.4

2022-04-07 Thread Juan Pedro Barbancho Manchón

Hi,

This is my package config:

https://github.com/barbyware/ambari-kylin-service/blob/master/configuration/kylin.xml
[https://opengraph.githubassets.com/dfcfab98f605ebc9676a23fe29b45fd11e26e6c45a5d1b4fbb355aa680fd754c/barbyware/ambari-kylin-service]<https://github.com/barbyware/ambari-kylin-service/blob/master/configuration/kylin.xml>
ambari-kylin-service/kylin.xml at master · 
barbyware/ambari-kylin-service<https://github.com/barbyware/ambari-kylin-service/blob/master/configuration/kylin.xml>
Contribute to barbyware/ambari-kylin-service development by creating an account 
on GitHub.
github.com



download.location

https://downloads.apache.org/kylin/apache-kylin-3.1.3/apache-kylin-3.1.3-bin-hadoop3.tar.gz
Location to download gstore
 





De: Yaqian Zhang 
Enviado: miércoles, 6 de abril de 2022 10:50
Para: user@kylin.apache.org 
Cc: Juan Pedro Barbancho Manchón 
Asunto: Re: Kylin for HDP 3.1.4

Hi:

I want to confirm that the binary package of Kylin you downloaded is 
"apache-kylin-3.1.3-bin-hadoop3.tar.gz<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Fdyn%2Fcloser.cgi%2Fkylin%2Fapache-kylin-3.1.3%2Fapache-kylin-3.1.3-bin-hadoop3.tar.gz=04%7C01%7Cjuanpedro.barbancho%40rsi.cajarural.com%7C1f37d12d325d49e5e81708da17aa85a8%7Cc0f01b1db7974275a0bb59e075b04b4f%7C0%7C0%7C637848318811680047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=XTSSrQukAMj7XRGiufI9XSS72LxONubZk2H9FGQy7pc%3D=0>”?

在 2022年4月3日，上午2:36，Juan Pedro Barbancho Manchón 
mailto:juanpedro.barban...@rsi.cajarural.com>>
 写道：

Hi Guys,

I have a older version of HDP the 3.1.4 realease.

I try to install the 3.1.3 kylin version but when I try to load a table 
generate a

 java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(Lorg/apache/hadoop/conf/Configuration;)V

I try to add some jars, etc, but I think that I need recompile the version 
using my spark 2.3.4, hive 3.1 and hadoop 3.1

Some one made this config and could provide to me ? I this only the kylin 
server or I need to change and compile more files.

apreciate any help

Thanks


 ADVERTENCIA LEGAL --- "Este mensaje puede contener 
INFORMACIÓN CONFIDENCIAL, PRIVILEGIADA y/o DATOS DE CARÁCTER PERSONAL. Si usted 
no es el destinatario indicado en este mensaje (o el responsable de entregarlo 
al mismo) no debe copiar o entregar este mensaje a nadie más. En dicho caso le 
rogamos que destruya este mensaje y lo notifique al remitente. Por favor, 
indique inmediatamente si usted o su empresa no aceptan comunicaciones de este 
tipo por Internet. Las opiniones, conclusiones y demás información incluida en 
este mensaje que no esté relacionada con asuntos profesionales del Grupo Caja 
Rural se entenderá que nunca se ha dado, ni está respaldada por el mismo." 
 LEGAL ADVICE --- "This message can contain restricted 
confidential information or personal data. If you are not the intended 
recipient (or the responsible to give it) you shouldn't copy or forward this 
message. If this message has been received by mistake, please, delete it and 
inform to addressee. If you or your company don't accept this kind of 
information by internet, please send us a notification inmediately. Grupo Caja 
Rural are not responsible for the opinions, conclusions, contents or any file 
attached included in this message, which were not related to professional 
matters.” ---

<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ruralserviciosinformaticos.com%2F=04%7C01%7Cjuanpedro.barbancho%40rsi.cajarural.com%7C1f37d12d325d49e5e81708da17aa85a8%7Cc0f01b1db7974275a0bb59e075b04b4f%7C0%7C0%7C637848318811680047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=UNqURNbcWN%2FFqSsxB%2FjC3p%2BiEpomYcY1ZAh9r%2BRf0ag%3D=0>
   Juan Pedro Barbancho Manchón
ANALYTICS - CRM - DISTRIBUCIÓN

Teléfono. 918076707
<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Frsi_TI%3Flang%3Des=04%7C01%7Cjuanpedro.barbancho%40rsi.cajarural.com%7C1f37d12d325d49e5e81708da17aa85a8%7Cc0f01b1db7974275a0bb59e075b04b4f%7C0%7C0%7C637848318811680047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=wKevPX21Q646P561PNZAthw92I1Jg4W%2FjlBuFO%2BHHwc%3D=0>
 
<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frural-servicios-inform%25C3%25A1ticos-s-c-%2F=04%7C01%7Cjuanpedro.barbancho%40rsi.cajarural.com%7C1f37d12d325d49e5e81708da17aa85a8%7Cc0f01b1db7974275a0bb59e075b04b4f%7C0%7C0%7C637848318811680047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=lb7cROrAAdTa6VkIf8ygIB1Eg8qMnBAOzHsd4QHXVHo%3D=0>
&

Re: Kylin for HDP 3.1.4

2022-04-06 Thread Yaqian Zhang

Hi:

I want to confirm that the binary package of Kylin you downloaded is 
"apache-kylin-3.1.3-bin-hadoop3.tar.gz 
”?

> 在 2022年4月3日，上午2:36，Juan Pedro Barbancho Manchón 
>  写道：
> 
> Hi Guys,
> 
> I have a older version of HDP the 3.1.4 realease.
> 
> I try to install the 3.1.3 kylin version but when I try to load a table 
> generate a 
> 
>  java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(Lorg/apache/hadoop/conf/Configuration;)V
> 
> I try to add some jars, etc, but I think that I need recompile the version 
> using my spark 2.3.4, hive 3.1 and hadoop 3.1
> 
> Some one made this config and could provide to me ? I this only the kylin 
> server or I need to change and compile more files.
> 
> apreciate any help
> 
> Thanks
> 
> 
>  ADVERTENCIA LEGAL --- "Este mensaje puede contener 
> INFORMACIÓN CONFIDENCIAL, PRIVILEGIADA y/o DATOS DE CARÁCTER PERSONAL. Si 
> usted no es el destinatario indicado en este mensaje (o el responsable de 
> entregarlo al mismo) no debe copiar o entregar este mensaje a nadie más. En 
> dicho caso le rogamos que destruya este mensaje y lo notifique al remitente. 
> Por favor, indique inmediatamente si usted o su empresa no aceptan 
> comunicaciones de este tipo por Internet. Las opiniones, conclusiones y demás 
> información incluida en este mensaje que no esté relacionada con asuntos 
> profesionales del Grupo Caja Rural se entenderá que nunca se ha dado, ni está 
> respaldada por el mismo."  LEGAL ADVICE --- "This 
> message can contain restricted confidential information or personal data. If 
> you are not the intended recipient (or the responsible to give it) you 
> shouldn't copy or forward this message. If this message has been received by 
> mistake, please, delete it and inform to addressee. If you or your company 
> don't accept this kind of information by internet, please send us a 
> notification inmediately. Grupo Caja Rural are not responsible for the 
> opinions, conclusions, contents or any file attached included in this 
> message, which were not related to professional matters.” 
> ---
>  
> Juan Pedro Barbancho 
> Manchón
> ANALYTICS - CRM - DISTRIBUCIÓN
> 
> Teléfono. 918076707 
>  
>  
>  
>   
>  
> 
>  
> 
>  
> 
>  ADVERTENCIA LEGAL --- "Este mensaje puede contener 
> INFORMACIÓN CONFIDENCIAL, PRIVILEGIADA y/o DATOS DE CARÁCTER PERSONAL. Si 
> usted no es el destinatario indicado en este mensaje (o el responsable de 
> entregarlo al mismo) no debe copiar o entregar este mensaje a nadie más. En 
> dicho caso le rogamos que destruya este mensaje y lo notifique al remitente. 
> Por favor, indique inmediatamente si usted o su empresa no aceptan 
> comunicaciones de este tipo por Internet. Las opiniones, conclusiones y demás 
> información incluida en este mensaje que no esté relacionada con asuntos 
> profesionales del Grupo Caja Rural se entenderá que nunca se ha dado, ni está 
> respaldada por el mismo."  LEGAL ADVICE --- "This 
> message can contain restricted confidential information or personal data. If 
> you are not the intended recipient (or the responsible to give it) you 
> shouldn't copy or forward this message. If this message has been received by 
> mistake, please, delete it and inform to addressee. If you or your company 
> don't accept this kind of information by internet, please send us a 
> notification inmediately. Grupo Caja Rural are not responsible for the 
> opinions, conclusions, contents or any file attached included in this 
> message, which were not related to professional matters.” 
> ---

Re: kylin3.1.3构建cube失败

2022-04-01 Thread Yaqian Zhang

Hi:

This is the error in which step of build cube? Are there any other errors 
reported in  kylin.log? It doesn't seem like a complete error report, which 
can't help locate the root cause.

You can provide information about the environment where you deploy kylin to 
facilitate troubleshooting, such as Hadoop version and hive version, special 
configuration, etc.

> 在 2022年4月1日，下午5:55，黄奇  写道：
> 
> 想问一下kylin有钉钉群吗，想加个kylin相关群
> 我用kylin3.1.3构建cube时出现以下问题，不知道该如何解决，想请教一下有人解决过这种问题吗
> 
> 
> 
>

Re: 退订

2022-03-29 Thread Yaqian Zhang

If you want to unsubscribe from your email, you should email user- 
unsubscr...@kylin.apache.org Or dev- unsubscr...@kylin.apache.org , not 
user@kylin.apache.org .

如果你想退订邮箱，你应该发邮件到 user-unsubscr...@kylin.apache.org 
 或者 dev-unsubscr...@kylin.apache.org 
，而不是 user@kylin.apache.org 
。

> 在 2022年3月30日，上午10:35，guowj  写道：
> 
> 请问这个还有管理员维护订阅吗？已退订3次，还能收到，烦请帮忙退订

Re: 退订

2022-03-29 Thread Yongjie

哥们儿你去这儿.
https://kylin.apache.org/community/

用你的邮箱给这个地址发一份信就退了 dev-unsubscr...@kylin.apache.org.

---

The subscribe and unsubscribe mailing list commands refer to the link:
https://kylin.apache.org/community/

You can send a mail to "dev-unsubscr...@kylin.apache.org" If you just wanna
unsubscribe from this mailing.

On Wed, Mar 30, 2022 at 10:35 AM guowj  wrote:

> 请问这个还有管理员维护订阅吗？已退订3次，还能收到，烦请帮忙退订
>

-- 

Best regards,

Yongjie

Re: package kylin4.0.1 failed:npm ERR! phantomjs-prebuilt@2.1.16 install: `node install.js`

2022-03-29 Thread Yaqian Zhang

Hi：

A kylin user once encountered similar problems when packaging. His solution is 
for your reference:

一、npm ERR! code ELIFECYCLE 

refer to： https://www.cleey.com/blog/single/id/911.html
Download already available at 
/tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
Verified checksum of previously downloaded file
Extracting tar contents (via spawned process)Removing 
/usr/local/open_dnsdb/dnsdb_fe/node_modules/phantomjs-prebuilt/lib/phantom
Copying extracted folder 
/tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2-extract-1614183154060/phantomjs-2.1.1-linux-x86_64
 -> /usr/local/open_dnsdb/dnsdb_fe/node_modules/phantomjs-prebuilt/lib/phantom
Phantom installation failed { [Error: EACCES: permission denied, link 
'/tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2-extract-1614183154060/phantomjs-2.1.1-linux-x86_64'
 -> 
'/usr/local/open_dnsdb/dnsdb_fe/node_modules/phantomjs-prebuilt/lib/phantom']
errno: -13,
code: 'EACCES',
syscall: 'link',
path:
'/tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2-extract-1614183154060/phantomjs-2.1.1-linux-x86_64',
dest:
'/usr/local/open_dnsdb/dnsdb_fe/node_modules/phantomjs-prebuilt/lib/phantom' } 
Error: EACCES: permission denied, link 
'/tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2-extract-1614183154060/phantomjs-2.1.1-linux-x86_64'
 -> '/usr/local/open_dnsdb/dnsdb_fe/node_modules/phantomjs-prebuilt/lib/phantom'
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.9 
(node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for 
fsevents@1.2.9: wanted {"os":"darwin","arch":"any"} (current: 
{"os":"linux","arch":"x64"})

npm ERR! code ELIFECYCLE
解决：
#下载
wget 
https://npm.taobao.org/mirrors/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
tar -jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2
#加入环境变量
vim /etc/profile
#末尾加入，注意文件路径export PATH=$PATH:/usr/local/phantomjs-2.1.1-linux-x86_64/bin
#执行
source /etc/profile


rm -rf ./node_modules && npm install --unsafe-perm
解决



二、phantomjs-prebuilt@2.1.16 错误 

参照ttps://blog.csdn.net/XuM22/article/details/82790802?utm_medium=distribute.pc_relevant.none-task-b

> 在 2022年3月30日，上午9:48，guqiujun  写道：
> 
> When I tried to package Kylin4.0.1, I ran package.sh and had the following 
> problem
> The translation:当我尝试给kylin4.0.1打包时,运行package.sh,出现了以下问题
> > phantomjs-prebuilt@2.1.16 install 
> > /home/packageTest/apache-kylin-4.0.1/webapp/node_modules/phantomjs-prebuilt
> > node install.js
> 
> Considering PhantomJS found at /home/node/bin/phantomjs
> Looks like an `npm install -g`
> Could not link global install, skipping...
> Download already available at 
> /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
> Verified checksum of previously downloaded file
> Extracting tar contents (via spawned process)
> Removing 
> /home/packageTest/apache-kylin-4.0.1/webapp/node_modules/phantomjs-prebuilt/lib/phantom
> Copying extracted folder 
> /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2-extract-1648604149099/phantomjs-2.1.1-linux-x86_64
>  -> 
> /home/packageTest/apache-kylin-4.0.1/webapp/node_modules/phantomjs-prebuilt/lib/phantom
> chmod failed: phantomjs was not successfully copied to 
> /home/packageTest/apache-kylin-4.0.1/webapp/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs
> npm WARN tap@16.0.0 requires a peer of coveralls@^3.1.1 but none is 
> installed. You must install peer dependencies yourself.
> npm WARN tap@16.0.0 requires a peer of flow-remove-types@>=2.112.0 but none 
> is installed. You must install peer dependencies yourself.
> npm WARN tap@16.0.0 requires a peer of ts-node@>=8.5.2 but none is installed. 
> You must install peer dependencies yourself.
> npm WARN tap@16.0.0 requires a peer of typescript@>=3.7.2 but none is 
> installed. You must install peer dependencies yourself.
> npm WARN ws@7.5.7 requires a peer of bufferutil@^4.0.1 but none is installed. 
> You must install peer dependencies yourself.
> npm WARN ws@7.5.7 requires a peer of utf-8-validate@^5.0.2 but none is 
> installed. You must install peer dependencies yourself.
> npm WARN base@0.0.1 No license field.
> npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.3.2 
> (node_modules/fsevents):
> npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for 
> fsevents@2.3.2: wanted {"os":"darwin","arch":"any"} (current: 
> {"os":"linux","arch":"x64"})
> 
> npm ERR! code ELIFECYCLE
> npm ERR! errno 1
> npm ERR! phantomjs-prebuilt@2.1.16 install: `node install.js`
> npm ERR! Exit status 1
> npm ERR!
> npm ERR! Failed at the phantomjs-prebuilt@2.1.16 install script.
> npm ERR! This is probably not a problem with npm. There is likely additional 
> logging output above.
> 
> npm ERR! A complete log of this run can be found in:
> npm ERR! /root/.npm/_logs/2022-03-30T01_35_52_321Z-debug.log
> 
> Here is the  environment information:
> The translation:以下是打包环境信息:
> CentorOs7
> Apache Maven 3.5.4 
> git version 1.8.3.1
> node-v10.14.1-linux-x64
> npm6.4.1
> 
> Attached is the log

Re: Build Package for kylin4 failed

2022-03-18 Thread Yaqian Zhang

Maybe this is related to the version of maven-shade-plugin. You can upgrade 
maven-shade-plugin and try again.

> 在 2022年3月16日，下午3:54，guqiujun  写道：
> 
> When I Build Package for kylin4(run build/script/package.sh),there are some 
> problems:
> 
> [INFO] Apache Kylin 4.X - Integration Test  SUCCESS [ 31.972 
> s]
> [INFO] Apache Kylin - Parquet Assembly 4.0.1 .. FAILURE [ 27.811 
> s]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 13:01 min
> [INFO] Finished at: 2022-03-16T15:37:14+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.0:shade (default) on project 
> parquet-assembly: Error creating shaded jar: null: IllegalArgumentException 
> -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-shade-plugin:3.1.0:shade (default) on 
> project parquet-assembly: Error creating shaded jar: null
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:213)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:154)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:146)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:56)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:954)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:289)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:229)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:415)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:356)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Error creating 
> shaded jar: null
> at org.apache.maven.plugins.shade.mojo.ShadeMojo.execute 
> (ShadeMojo.java:546)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:137)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:208)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:154)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:146)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:56)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:954)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:289)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch

Re:[Announcement] Beta release of MDX for Kylin

2022-03-10 Thread Mukvin

The address of link 2 in the Announcement is
https://mp.weixin.qq.com/s/w4nTjwh0sq6ze4gyXwL1HA .

Best regards.
Tengting Xu

At 2022-03-11 14:19:03, "Mukvin" wrote:

Hi all,
At the beginning of 2022, Kylin community discussed the positioning of
multidimensional databases and the idea of building a business semantic layer
based on Kylin. After a period of development and testing, we are pleased to
announce that Kylin community has released a technical preview version of "MDX
for Kylin", an MDX query engine which supports Apache Kylin 4.X as Data Source.
MDX for Kylin allows Kylin users to use Excel to analyze big data.
"MDX for Kylin" is developed based on Mondrian[Link3], and it is
contributed by Kyligence. The user experience of "MDX for Kylin" is close to
Microsoft SSAS, and it can integrate a variety of data analysis tools,
including Microsoft Excel, Tableau, etc. "MDX for Kylin" provides a more
extreme experience for big data analysis scenarios.
Please refer to the link to learn and try:
https://kylin.apache.org/docs/tutorial/quick_start_for_mdx.html.

Further reading
1. https://lists.apache.org/thread/4fkhyw1fyf0jg5cb18v7vxyqbn6vm3zv
2. 如何使用 Excel 查询 Kylin？MDX for Kylin！
3. https://mondrian.pentaho.com/documentation/mdx.php
4. https://docs.microsoft.com/en-us/sql/mdx/mdx-syntax-elements-mdx
5.
https://dba.stackexchange.com/questions/138311/good-example-of-mdx-vs-sql-for-analytical-queries
6. https://kyligence.io/blog/opportunities-for-ssas-in-the-cloud/
7.
https://kyligence.io/blog/semantic-layer-the-bi-trend-you-dont-want-to-miss-in-2020/
8. https://docs.kyligence.io/books/mdx/v1.3/en/index.html
9.
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
| |
Mukvin
|
|
boyboys...@163.com
|

Re: 元库字段META_TABLE_KEY会发生重复吗?

2022-03-03 Thread Yaqian Zhang

Hi:

The field META_TABLE_KEY will not be repeated. Its value corresponds to the 
path of each metadata.

> 在 2022年3月3日，下午3:43，guqiujun  写道：
> 
> 在kylin4.x的运行过程中,元数据库字段:META_TABLE_KEY有可能会重复吗?
> 
> 
>

Re: How to import data in docker env

2022-02-14 Thread Yaqian Zhang

Hi:
If you want to perform some operations on the command line inside the docker 
container, you need to enter the docker container, which requires you to 
understand some docker operation commands.


> 在 2022年2月11日，下午7:56，无用先生 <80830...@qq.com> 写道：
> 
> Hi all,
> 
> I try to learn kylin via docker env. I down image and start successfully. But 
> now I need to import some data and build my cube project. 
> Now how can I create table in hive and import data? I mean, in hive, there's 
> some page to do such things, but in kylin docker env, how can I create table 
> and import data?
> 
> thanks

Re: Can't access kylin web ui

2022-02-10 Thread Yaqian Zhang

Hi:

Maybe Kylin encountered some problems when starting in the docker environment 
and failed to start successfully. 

You can use the `docker exec` command to enter the docker to check the specific 
situation and restart kylin manually.

> 在 2022年2月10日，下午6:44，无用先生 <80830...@qq.com> 写道：
> 
> Hi all
> I just start docker image acroding this page 
> https://kylin.apache.org/docs40/install/kylin_docker.html
> Now I can open follow pages:
> Hdfs NameNode Web UI: http://127.0.0.1:50070 
> Yarn ResourceManager Web UI: http://127.0.0.1:8088 
> But I can't open 
> Kylin Web UI: http://127.0.0.1:7070/kylin/logi 
> n
> There is no any error information when I start docker image by follow 
> instruction:
> 
> docker run -d -m 8G -p 7070:7070 -p 8088:8088 -p 50070:50070 -p 8032:8032 -p 
> 8042:8042 -p 2181:2181 apachekylin/apache-kylin-standalone:4.0.0
> 
> 
>

Re: Does Kylin 4.0.1 supports CDP 7.1.5

2022-02-10 Thread Yaqian Zhang

Hi Venkatesh：

Here is a list of all Hadoop environments supported and verified by Kylin4.0.1 
for your reference.

https://cwiki.apache.org/confluence/display/KYLIN/Support+Hadoop+Version+Matrix+of+Kylin+4

We haven't tested in CDP environment and can't support CDP 7.1.5 at present.


> 在 2022年2月10日，下午10:08，Popalli, Venkatesh  写道：
> 
> Team,
>  
> Does Kylin 4.0.1 supports CDP 7.1.5?  We don’t see any documentation on this. 
> Can you please assist?
>  
>  
> Regards
> Venkatesh
> NOTICE: All information in and attached to the e-mails below may be 
> proprietary, confidential, privileged and otherwise protected from improper 
> or erroneous disclosure. If you are not the sender's intended recipient, you 
> are not authorized to intercept, read, print, retain, copy, forward, or 
> disseminate this message. If you have erroneously received this 
> communication, please notify the sender immediately by phone (704-758-1000) 
> or by e-mail and destroy all copies of this message electronic, paper, or 
> otherwise. By transmitting documents via this email: Users, Customers, 
> Suppliers and Vendors collectively acknowledge and agree the transmittal of 
> information via email is voluntary, is offered as a convenience, and is not a 
> secured method of communication; Not to transmit any payment information E.G. 
> credit card, debit card, checking account, wire transfer information, 
> passwords, or sensitive and personal information E.G. Driver's license, DOB, 
> social security, or any other information the user wishes to remain 
> confidential; To transmit only non-confidential information such as plans, 
> pictures and drawings and to assume all risk and liability for and indemnify 
> Lowe's from any claims, losses or damages that may arise from the transmittal 
> of documents or including non-confidential information in the body of an 
> email transmittal. Thank you.

RE: Pam config and Ranger for kylin

2022-01-20 Thread Juan Pedro Barbancho Manchón

I see this document and review

I have a bit of lack of knowlege, I need know how each user can validate and
authenticate, I think that may be using ldap, but I hope that I could change
the autheticantor class in order to use PAM, similar to hive (by example)

Thanks

[cid:Firmarsi_a1230080-70f9-4d63-8583-8fcbefb53386.jpg]<http://www.ruralserviciosinformaticos.com/>
Juan Pedro Barbancho Manchón
ANALYTICS - CRM - DISTRIBUCIÓN

Teléfono. 918076707

[cid:twitter_firmas_1221cdaa-5a23-4a87-99d0-11ed906c043d.jpg]<https://twitter.com/rsi_TI?lang=es>
[cid:linkedin-firmas_bf01a1be-9c2e-4b48-a092-7ba616d7f5ed.jpg]
<https://www.linkedin.com/company/rural-servicios-inform%C3%A1ticos-s-c-/>

[cid:nuevisimo_dfffefb2-09df--9831-7f99acb3f2d0.jpg]<http://www.ruralserviciosinformaticos.com/cms/estatico/bl/rsi/web/es/rsi/secciones/rural/productos/meta-cliente/index.html?exp=TRUE><http://www.ruralserviciosinformaticos.com/cms/estatico/bl/rsi/web/es/rsi/secciones/rural/productos/meta-cliente/index.html?exp=TRUE>

De: Yaqian Zhang
Enviado: viernes, 21 de enero de 2022 7:45
Para: user@kylin.apache.org
Asunto: Re: Pam config and Ranger for kylin

Hi:

You can refer to this document:
https://cwiki.apache.org/confluence/display/RANGER/Kylin+Plugin

在 2022年1月20日，上午5:32，Juan Pedro Barbancho Manchón
mailto:juanpedro.barban...@rsi.cajarural.com>>
写道：

Some one had config Pam for user access to apache kylin. Similar to config Pam
hive.

The same question for Ranger policys to tablets.

Could you senda some info about It.

Thanks.

ADVERTENCIA LEGAL --- "Este mensaje puede contener
INFORMACIÓN CONFIDENCIAL, PRIVILEGIADA y/o DATOS DE CARÁCTER PERSONAL. Si usted
no es el destinatario indicado en este mensaje (o el responsable de entregarlo
al mismo) no debe copiar o entregar este mensaje a nadie más. En dicho caso le
rogamos que destruya este mensaje y lo notifique al remitente. Por favor,
indique inmediatamente si usted o su empresa no aceptan comunicaciones de este
tipo por Internet. Las opiniones, conclusiones y demás información incluida en
este mensaje que no esté relacionada con asuntos profesionales del Grupo Caja
Rural se entenderá que nunca se ha dado, ni está respaldada por el mismo."
LEGAL ADVICE --- "This message can contain restricted
confidential information or personal data. If you are not the intended
recipient (or the responsible to give it) you shouldn't copy or forward this
message. If this message has been received by mistake, please, delete it and
inform to addressee. If you or your company don't accept this kind of
information by internet, please send us a notification inmediately. Grupo Caja
Rural are not responsible for the opinions, conclusions, contents or any file
attached included in this message, which were not related to professional
matters.” ---
ADVERTENCIA LEGAL --- "Este mensaje puede contener INFORMACIÓN
CONFIDENCIAL, PRIVILEGIADA y/o DATOS DE CARÁCTER PERSONAL. Si usted no es el
destinatario indicado en este mensaje (o el responsable de entregarlo al mismo)
no debe copiar o entregar este mensaje a nadie más. En dicho caso le rogamos
que destruya este mensaje y lo notifique al remitente. Por favor, indique
inmediatamente si usted o su empresa no aceptan comunicaciones de este tipo por
Internet. Las opiniones, conclusiones y demás información incluida en este
mensaje que no esté relacionada con asuntos profesionales del Grupo Caja Rural
se entenderá que nunca se ha dado, ni está respaldada por el mismo."
LEGAL ADVICE --- "This message can contain restricted
confidential information or personal data. If you are not the intended
recipient (or the responsible to give it) you shouldn't copy or forward this
message. If this message has been received by mistake, please, delete it and
inform to addressee. If you or your company don't accept this kind of
information by internet, please send us a notification inmediately. Grupo Caja
Rural are not responsible for the opinions, conclusions, contents or any file
attached included in this message, which were not related to professional
matters.” ---

Re: Pam config and Ranger for kylin

2022-01-20 Thread Yaqian Zhang

Hi:

You can refer to this document: 
https://cwiki.apache.org/confluence/display/RANGER/Kylin+Plugin

> 在 2022年1月20日，上午5:32，Juan Pedro Barbancho Manchón 
>  写道：
> 
> Some one had config Pam for user access to apache kylin. Similar to config 
> Pam hive.
> 
> The same question for Ranger policys to tablets.
> 
> Could you senda some info about It.
> 
> Thanks.

Re: How to UNSUBSCRIBING

2022-01-11 Thread Xiaoxiang Yu

If you want to unsubscrise this mailling list, please send any text to 
user-unsubscr...@kylin.apache.org or dev-unsubscr...@kylin.apache.org, not 
user@kylin.apache.org or d...@kylin.apache.org 
.如果你希望不再收到这个邮件组的邮件，请使用你的邮箱发送简短的文本（例如：退订）到 user-unsubscr...@kylin.apache.org 或者 
dev-unsubscr...@kylin.apache.org，发送此类“退订”文字到 user@kylin.apache.org 或者 
d...@kylin.apache.org 无法使你不再接受此邮件组的邮件！！

FURTHER READING
https://www.apache.org/foundation/mailinglists.html
https://infra.apache.org/contrib-email-tips




--

Best wishes to you ! 
From ：Xiaoxiang Yu




在 2022-01-11 17:18:18，"jiang_wen...@163.com"  写道：

退订


jiang_wen...@163.com

Re: [DISCUSS] The future of Apache Kylin

2022-01-11 Thread ShaoFeng Shi

+1

Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL
is the main query language, which makes it is a little confusing for users
to differentiate it from other technologies. Introducing the new semantic
layer will make Kylin a more complete solution.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2022年1月11日周二 16:07写道：

> Cool!
> Looking forward to the new features of the next generation Apache Kylin.
>
> 在 2022年1月11日，下午2:30，Xiaoxiang Yu  写道：
>
> Thanks Yang, there are two new features that I really looking forward to,
> and they are:
>
> 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and
> more BI tools.
> 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as
> add/delete dimensions/measures) which status is Ready without purge the any
> useful cuboid/segmemnt .
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2022-01-11 13:59:13, "Li Yang"  wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires

Re: [DISCUSS] The future of Apache Kylin

2022-01-11 Thread Yaqian Zhang

Cool! 
Looking forward to the new features of the next generation Apache Kylin.

> 在 2022年1月11日，下午2:30，Xiaoxiang Yu  写道：
> 
> Thanks Yang, there are two new features that I really looking forward to, and 
> they are:
> 
> 1. New SEMANTIC LAYER will make Kylin be accessible by excel (MDX) and more 
> BI tools.
> 2. New flexible ModeL will let Kylin user modify Model/Cube (such as 
> add/delete dimensions/measures) which status is Ready without purge the any 
> useful cuboid/segmemnt .
> 
> --
> Best wishes to you ! 
> From ：Xiaoxiang Yu
> 
> 
> At 2022-01-11 13:59:13, "Li Yang"  wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires introducing a more friendly data model for
> >non-technical personnel — multi-dimensional data model. While the
> >relational model describes the technical form of data, the
> >multi-dimensional model describes the business form of data. In a MDB,
> >measurement corresponds to business indicators that everyone understands,
> >and dimension is the perspective of comparing and observing these business
> >indicators. Compare KPI with last month and compare performance between
> >parallel business units, which are concepts understood by every
> >non-technical personnel. By mapping the relational model to the
>

Re:[DISCUSS] The future of Apache Kylin

2022-01-10 Thread Xiaoxiang Yu

Thanks Yang, there are two new features that I really looking forward to, and 
they are:


1. New SEMANTIC LAYER will make Kylin be accessible by excel (MDX) and more BI 
tools.
2. New flexible ModeL will let Kylin user modify Model/Cube (such as add/delete 
dimensions/measures) which status is Ready without purge the any useful 
cuboid/segmemnt .




--

Best wishes to you ! 
From ：Xiaoxiang Yu





At 2022-01-11 13:59:13, "Li Yang"  wrote:
>Hi All
>
>Apache Kylin has been stable for quite a while and it may be a good time to
>think about the future of it. Below are thoughts from my team and myself.
>Love to hear yours as well. Ideas and comments are very welcome.  :-)
>
>*APACHE KYLIN TODAY*
>
>Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
>a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
>Parquet to replace HBase as storage engine, so as to improve file scanning
>performance. At the same time, Kylin 4.0 reimplements the spark based build
>engine and query engine, making it possible to separate computing and
>storage, and better adapt to the technology trend of cloud native. Kylin
>4.0 comprehensively updated the build and query engine, realized the
>deployment mode without Hadoop dependency, decreasing the complexity of
>deployment. However, Kylin also has a lot to improve, such as the ability
>of business semantic layer needs to be strengthened and the modification of
>model/cube is not flexible. With these, we thinking a few things to do:
>
>   - Multi-dimensional query ability friendly to non-technical personnel.
>   Multi-dimensional model is the key to distinguish Kylin from the general
>   OLAP engines. The feature is that the model concept based on dimension and
>   measurement is more friendly to non-technical personnel and closer to the
>   goal of citizen analyst. The multi-dimensional query capability that
>   non-technical personnel can use should be the new focus of Kylin
>   technology.
>
>
>   - Native Engine. The query engine of Kylin still has much room for
>   improvement in vector acceleration and cpu instruction level optimization.
>   The Spark community Kylin relies on also has a strong demand for native
>   engine. It is optimistic that native engine can improve the performance of
>   Kylin by at least three times, which is worthy of investment.
>
>
>   - More cloud native capabilities. Kylin 4.0 has only completed the
>   initial cloud deployment and realized the features of rapid deployment and
>   dynamic resource scaling on the cloud, but there are still many cloud
>   native capabilities to be developed.
>
>More explanations are following.
>
>*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
>
>The core of Kylin is a multi-dimensional database, which is a special OLAP
>engine. Although Kylin has always had the ability of a relational database
>since its birth, and it is often compared with other relational OLAP
>engines, what really makes Kylin different is multi-dimensional model and
>multi-dimensional database ability. Considering the essence of Kylin and
>its wide range of business uses in the future (not only technical uses),
>positioning Kylin as a multi-dimensional database makes perfect sense. With
>business semantics and precomputation technology, Apache Kylin helps
>non-technical people understand and afford big data, and realizes data
>democratization.
>
>*THE SEMANTIC LAYER*
>
>The key difference between the multi-dimensional database and the
>relational database is business expression ability. Although SQL has strong
>expression ability and is the basic skill of data analysts, SQL and the RDB
>are still too difficult for non-technical personnel if we aim at "everyone
>is a data analyst". From the perspective of non-technical personnel, the
>data lake and data warehouse are like a dark room. They know that there is
>a lot of data, but they can't see clearly, understand and use this data
>because they don't understand database theory and SQL.
>
>How to make the Data Lake (and data warehouse) clear to non-technical
>personnel? This requires introducing a more friendly data model for
>non-technical personnel — multi-dimensional data model. While the
>relational model describes the technical form of data, the
>multi-dimensional model describes the business form of data. In a MDB,
>measurement corresponds to business indicators that everyone understands,
>and dimension is the perspective of comparing and observing these business
>indicators. Compare KPI with last month and compare performance between
>parallel business units, which are concepts understood by every
>non-technical personnel. By mapping the relational model to the
>multi-dimensional model, the essence is to enhance the business semantics
>on the technical data, form a business semantic layer, and help
>non-technical personnel understand, explore and use the data. In order to
>enhance Kylin's ability as the semantic layer, supporting multi-dimensional
>query

Re:退订

2022-01-10 Thread Xiaoxiang Yu

If you want to unsubscrise this mailling list, please send any text to 
user-unsubscr...@kylin.apache.org or  dev-unsubscr...@kylin.apache.org, not 
user @user@kylin.apache.org  or dev @d...@kylin.apache.org .

如果你希望不再收到这个邮件组的邮件，请使用你的邮箱发送简短的文本（例如：退订）到 user-unsubscr...@kylin.apache.org 或者 
dev-unsubscr...@kylin.apache.org，发送此类退订邮件到 @user@kylin.apache.org 或者 
@d...@kylin.apache.org 无法使你不再接受此邮件组的邮件。

--

Best wishes to you ! 
From ：Xiaoxiang Yu




在 2022-01-07 09:04:21，"Liu Ya Meng"  写道：

退订






 

 
--
Thanks & Best Regards

Re: Kylin 3.1.3: Error when get coordinator leader

2022-01-06 Thread Árki Gábor

It seems to be working now, thank you for your help.

On Thu., Jan. 6, 2022, 03:37 Yaqian Zhang,  wrote:

> Hi Gabor:
>
> In order to fix security issues, some modifications have been made to the
> real-time function of Kylin 3.1.3n.
>
> You need to set “kylin.server.mode=stream_coordinator" in
> kylin.properties of the coordinator node, which can no longer be set to “
> kylin.server.mode=all” for coordinator node.
>
> Thank you for your test and report, we will update the document later to
> remind kylin users  of the changes here.
>
> 在 2022年1月6日，上午4:57，Árki Gábor  写道：
>
> Hi All,
>
> I wanted to try out Kylin 3.1.3 with a clean installation but ran into an
> issue. The stream receiver is unable to start due to the following error:
>
> 2021-12-30 18:20:07,474 ERROR [main] server.StreamingReceiver:53 :
> streaming receiver start fail
> org.apache.kylin.stream.coordinator.exception.StoreException:
> com.fasterxml.jackson.core.JsonParseException: Unexpected character ('.'
> (code 46)): Expected space separating root-level values
>  at [Source: (byte[])"24.0.1.128"; line: 1, column: 6]
> at
> org.apache.kylin.stream.coordinator.ZookeeperStreamMetadataStore.getCoordinatorNode(ZookeeperStreamMetadataStore.java:276)
> at
> org.apache.kylin.stream.coordinator.client.HttpCoordinatorClient.(HttpCoordinatorClient.java:53)
> at
> org.apache.kylin.stream.server.StreamingServer.(StreamingServer.java:126)
> at
> org.apache.kylin.stream.server.StreamingServer.getInstance(StreamingServer.java:141)
> at
> org.apache.kylin.stream.server.StreamingReceiver.startStreamingServer(StreamingReceiver.java:67)
> at
> org.apache.kylin.stream.server.StreamingReceiver.start(StreamingReceiver.java:61)
> at
> org.apache.kylin.stream.server.StreamingReceiver.main(StreamingReceiver.java:51)
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
> character ('.' (code 46)): Expected space separating root-level values
>  at [Source: (byte[])"24.0.1.128"; line: 1, column: 6]
> at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:712)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:637)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportMissingRootWS(ParserMinimalBase.java:684)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._verifyRootSpace(UTF8StreamJsonParser.java:1659)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseFloat(UTF8StreamJsonParser.java:1626)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parsePosNumber(UTF8StreamJsonParser.java:1393)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:854)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:753)
> at
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4340)
> at
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4189)
> at
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3266)
> at
> org.apache.kylin.common.util.JsonUtil.readValue(JsonUtil.java:76)
> at
> org.apache.kylin.stream.coordinator.ZookeeperStreamMetadataStore.getCoordinatorNode(ZookeeperStreamMetadataStore.java:272)
> ... 6 more
>
> I did check in ZooKeeper and indeed the path is containing the plain
> string of the IP address instead of a valid JSON Node object that is
> present in ZK in our 3.1.0 installation:
>
> [hadoop@ip-24-0-1-128 kylin_stream]$ hbase zkcli
> Connecting to ip-24-0-1-124.us-west-2.compute.internal:2181
> Welcome to ZooKeeper!
> JLine support is disabled
>
> WATCHER::
>
> WatchedEvent state:SyncConnected type:None path:null
> get /kylin/kylin_metadata/stream/coordinator
> 24.0.1.128
> cZxid = 0x77
> ctime = Thu Dec 30 18:18:27 UTC 2021
> mZxid = 0x77
> mtime = Thu Dec 30 18:18:27 UTC 2021
> pZxid = 0x77
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 10
> numChildren = 0
>
> Do you have any idea what could be the issue?
>
> Regards,
> Gabor
>
>
>

Re: Kylin 3.1.3: Error when get coordinator leader

2022-01-05 Thread Yaqian Zhang

Hi Gabor:

In order to fix security issues, some modifications have been made to the 
real-time function of Kylin 3.1.3n. 

You need to set “kylin.server.mode=stream_coordinator" in kylin.properties of 
the coordinator node, which can no longer be set to “kylin.server.mode=all” for 
coordinator node.

Thank you for your test and report, we will update the document later to remind 
kylin users  of the changes here.

> 在 2022年1月6日，上午4:57，Árki Gábor  写道：
> 
> Hi All,
> 
> I wanted to try out Kylin 3.1.3 with a clean installation but ran into an 
> issue. The stream receiver is unable to start due to the following error:
> 
> 2021-12-30 18:20:07,474 ERROR [main] server.StreamingReceiver:53 : streaming 
> receiver start fail
> org.apache.kylin.stream.coordinator.exception.StoreException: 
> com.fasterxml.jackson.core.JsonParseException: Unexpected character ('.' 
> (code 46)): Expected space separating root-level values
>  at [Source: (byte[])"24.0.1.128"; line: 1, column: 6]
> at 
> org.apache.kylin.stream.coordinator.ZookeeperStreamMetadataStore.getCoordinatorNode(ZookeeperStreamMetadataStore.java:276)
> at 
> org.apache.kylin.stream.coordinator.client.HttpCoordinatorClient.(HttpCoordinatorClient.java:53)
> at 
> org.apache.kylin.stream.server.StreamingServer.(StreamingServer.java:126)
> at 
> org.apache.kylin.stream.server.StreamingServer.getInstance(StreamingServer.java:141)
> at 
> org.apache.kylin.stream.server.StreamingReceiver.startStreamingServer(StreamingReceiver.java:67)
> at 
> org.apache.kylin.stream.server.StreamingReceiver.start(StreamingReceiver.java:61)
> at 
> org.apache.kylin.stream.server.StreamingReceiver.main(StreamingReceiver.java:51)
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected 
> character ('.' (code 46)): Expected space separating root-level values
>  at [Source: (byte[])"24.0.1.128"; line: 1, column: 6]
> at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
> at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:712)
> at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:637)
> at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportMissingRootWS(ParserMinimalBase.java:684)
> at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._verifyRootSpace(UTF8StreamJsonParser.java:1659)
> at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseFloat(UTF8StreamJsonParser.java:1626)
> at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parsePosNumber(UTF8StreamJsonParser.java:1393)
> at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:854)
> at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:753)
> at 
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4340)
> at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4189)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3266)
> at org.apache.kylin.common.util.JsonUtil.readValue(JsonUtil.java:76)
> at 
> org.apache.kylin.stream.coordinator.ZookeeperStreamMetadataStore.getCoordinatorNode(ZookeeperStreamMetadataStore.java:272)
> ... 6 more
> 
> I did check in ZooKeeper and indeed the path is containing the plain string 
> of the IP address instead of a valid JSON Node object that is present in ZK 
> in our 3.1.0 installation:
> 
> [hadoop@ip-24-0-1-128 kylin_stream]$ hbase zkcli
> Connecting to ip-24-0-1-124.us-west-2.compute.internal:2181
> Welcome to ZooKeeper!
> JLine support is disabled
> 
> WATCHER::
> 
> WatchedEvent state:SyncConnected type:None path:null
> get /kylin/kylin_metadata/stream/coordinator
> 24.0.1.128
> cZxid = 0x77
> ctime = Thu Dec 30 18:18:27 UTC 2021
> mZxid = 0x77
> mtime = Thu Dec 30 18:18:27 UTC 2021
> pZxid = 0x77
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 10
> numChildren = 0
> 
> Do you have any idea what could be the issue?
> 
> Regards,
> Gabor

Re: [Kylin Security Notice] Impact analysis of Apache Log4j2 Remote Code Execution Vulnerability

2021-12-10 Thread ShaoFeng Shi

Yaqian, thank you for the information!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2021年12月10日周五 18:58写道：

> Hi all:
>
> This is a security notice about the impact analysis of Apache Log4j2
> Remote Code Execution Vulnerability on Apache Kylin.
> Background
>
> Apache Log4j2 is a Java based logging tool, which is widely used in the
> industry. The recently discovered Remote Code Execution Vulnerability of
> Apache Log4j2 makes it possible for the program that introduces Apache
> Log4j2 to be triggered Remote Code Execution by an attacker who construct a
> special request.
> Scope of influence
>
> The version range of Log4j2 with security vulnerabilities is: Apache Log4j
> 2.x <= 2.14.1.
> The currently released versions of Apache Kylin (Kylin 2.x, Kylin 3.x,
> Kylin 4.x) use log4j version 1.2.17 by default. However, considering that
> kylin's startup script will load jars from Hadoop environment, including
> Hadoop, Spark, HBase, Hive and other components, the log4j version used in
> Hadoop3 environment is generally Apache Log4j2, so if your Hadoop is above
> version 3.0, it is recommended to upgrade the Log4j2  of Hadoop cluster, to
> avoid the possibility of polluting kylin services.
> Solution
>
> If the Hadoop component used by kylin user's environment uses Log4j2, the
> user needs to comprehensively upgrade Log4j2 to the latest 2.15.0-rc2 to
> prevent Kylin from loading the jar of Log4j2 with security risks into
> Kylin's classpath through scripts.
> After the log4j2 environment is fully upgraded, users can execute jinfo
> `cat pid` under $KYLIN_HOME to check whether the jar packages such as
> log4j-core-2.x.x.jar introduced by Kylin's classpath are the latest secure
> Log4j2 versions.
>
>
> Best Regards!
>
> Apache Kylin Team

Re: kylin使用

2021-11-26 Thread Yaqian Zhang

Hi：

Take KYLIN_SALES_SALES_ID as an example, It is spliced by table name 
“KYLIN_SALES"  and column name “SALES_ID" to represent the global dictionary 
column you want to reuse.


> 在 2021年11月26日，下午5:14，maozhaolin  写道：
> 
> 您好，首先感谢您的回复，但是我还是有个疑惑就是这个参数后边跟的KYLIN_SALES_SALES_ID,KYLIN_SALES_BUYER_ID这两个代表了什么含义呢？
> 
> 
> 
> 
> 
> At 2021-11-26 12:02:55, "Yaqian Zhang"  wrote:
> 
> Hi:
> 
> You can also configure the related configuration of the cube that has built 
> the hive dictionary on the cube that needs to reuse the dictionary, and then 
> add the configuration of “kylin.dictionary.mr-hive.ref.columns” on the basis 
> of this configuration. 
> 
> Its use example is as follows:
> 
> 
> Screenshot from Apache Kylin Document: 
> https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary
>  
> 
> 
>> 在 2021年11月26日，上午10:59，maozhaolin > > 写道：
>> 
>> 您好，我在使用kylin3.1.2版本的时候，使用到了hive来构建全局字典，现在我另外一个cube想要复用这个字典，请问怎么能实现呢？
>> 在官网看到kylin.dictionary.mr-hive.ref.columns这个参数，但是没有使用示例，请问您可不可以告知一下这个参数的使用方法
>> 
>> 
>>  
> 
> 
> 
>

Re: kylin使用

2021-11-25 Thread Yaqian Zhang

Hi:

You can also configure the related configuration of the cube that has built the 
hive dictionary on the cube that needs to reuse the dictionary, and then add 
the configuration of “kylin.dictionary.mr-hive.ref.columns” on the basis of 
this configuration. 

Its use example is as follows:


Screenshot from Apache Kylin Document: 
https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary

> 在 2021年11月26日，上午10:59，maozhaolin  写道：
> 
> 您好，我在使用kylin3.1.2版本的时候，使用到了hive来构建全局字典，现在我另外一个cube想要复用这个字典，请问怎么能实现呢？
> 在官网看到kylin.dictionary.mr-hive.ref.columns这个参数，但是没有使用示例，请问您可不可以告知一下这个参数的使用方法
> 
> 
>

Re: maozhaolin

2021-11-02 Thread Xiaoxiang Yu

Hi,
Unfortunately, there is no such tools to do migration for global 
dictionary, you have to rebuild the cube and delete old ones.






--
Xiaoxiang Yu
















在 2021-11-02 17:34:28，"maozhaolin"  写道：

您好，我这边在kylin的使用中，初始是使用了kylin本身的job 
server进行的字典构建，但是现在遇到瓶颈，需要更改为分布式字典创建，请问字典数据怎么迁移，kylin本身创建的字典是使用了什么编码格式呢？
kylin版本为3.1.2

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2359 matches

Mail list logo