Discussion : Propose to optimize some features which are very less be used.

2024-04-06 Thread Liang Chen
Dear community

I have one proposal , suggest optimizing some features which are very less
be used
please vote the below modules, whether can be optimized/removed :

index/secondary-index
index/bloom
index/lucene
geo

Regards
Liang


Re: C++ implementation requirement

2024-04-06 Thread Liang Chen
Dear

For supporting C++ implementation,  will be a big work, who can start this task 
, we can first discuss design ideas?

Regards
Liang

On 2023/10/23 06:09:39 Liang Chen wrote:
> Dear Jacky
> 
> Thanks for bring this discussion to community.
> good idea , +1 from my side,  C++ implementation cloud get much more
> performance improvement.
> 
> Regards
> Liang
> 
> Jacky Li  于2023年10月23日周一 04:47写道:
> 
> > Hi community,
> >
> >
> >   I have been using Apache CarbonData with Apache Spark to
> > process data for a long time. As data analytics requirements get
> > more, there are many emerging engines need to read the same data, which of
> > them are mostly in C+. So I am wondering is there a C++ implementation for
> > Carbondata reader and writer out there, or if the community is interested
> > in implementing one? I can help if someone is already started doing it.
> >
> >
> > Regards,
> > Jacky
> 


Re: DISCUSSION : Need to optimize the supported version of Spark

2024-03-31 Thread Liang Chen
+1, agree.
To David : can you please take this task ?

Regards
Liang


蔡强  于2024年3月29日周五 13:19写道:

> +1
>
> I suggest to integrate with 3.5.1, and no need to support old versions
>
> On Sun, 17 Mar 2024 at 21:49, Liang Chen  wrote:
>
> > Dear community
> >
> > Apache CarbonDate integrates multi version Spark engine, it is time to
> > upgrade version and optimize the old versions which don't need to be
> > supported.
> >
> > Please input your comments.
> >
> > Regards
> > Liang
> >
>


DISCUSSION : Need to optimize the supported version of Spark

2024-03-17 Thread Liang Chen
Dear community

Apache CarbonDate integrates multi version Spark engine, it is time to
upgrade version and optimize the old versions which don't need to be
supported.

Please input your comments.

Regards
Liang


Re: Personal question related to the current board report

2024-03-12 Thread Liang Chen
Dear Chris

Thank you for contacting the community.

In the past 6 months, around 5-6 contributors committed source code to the
project.

It is true, the community need to further encourage more contributors, we
will push it , thanks again.

Regard
Liang

Christofer Dutz  于 2024年1月14日周日 14:12写道:

> Hi all,
>
> I am currently reviewing the activity of CarbonData as part of my reviews
> for the next board meeting. I see that I commented on this in past
> reporting periods, but never got any feedback on it.
>
> When having a look at the commits@ list, I see only one person
> committing. Now this seems to be related to how these notifications work.
> If a person merges a PR, the person merging is set as sending user.
> However, I do ask myself, why is only one person in this project merging
> PRs?
>
> Chris
>


Re: C++ implementation requirement

2024-01-05 Thread Liang Chen
Dear Jacky

Good idea, then will improve query performance significantly. we need to
discuss further, how to start. may be you can first prepare a proposal
document , and send to dev mailing for discussion.

Regards
Liang

Jacky Li  于2023年10月23日周一 04:47写道:

> Hi community,
>
>
>   I have been using Apache CarbonData with Apache Spark to
> process data for a long time. As data analytics requirements get
> more, there are many emerging engines need to read the same data, which of
> them are mostly in C+. So I am wondering is there a C++ implementation for
> Carbondata reader and writer out there, or if the community is interested
> in implementing one? I can help if someone is already started doing it.
>
>
> Regards,
> Jacky


[ANNOUNCE] Apache CarbonData 2.3.1 Released

2023-11-25 Thread Liang Chen
Hi All,

The Apache CarbonData PMC team is happy to announce the release of Apache
CarbonData version 2.3.1

This release improved the below issues:
   - Fixed performance issues by changing index id
   - Fixed Secondary Index till segment level with SI as datamap, make
Secondary Index as a coarse grain Datamap and use SI for Presto queries

   - Fixed Exception in loading data with overwrite on partition table.
   - Fixed spark integration compile issues.
   - Fixed Exception in loading data with overwrite on partition table.
   - Fixed index issues of "sort_columns"
   - Fixed index issues about multiple sessions table
   - Fixed Exception in loading data with overwrite on partition table.
   - Fixed DDM sentence about column query failed.

You can find the release at :
https://dist.apache.org/repos/dist/release/carbondata/2.3.1/

You can follow this document to use these artifacts:
https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md


You can find the latest CarbonData document and learn more at:
http://carbondata.apache.org 

Thanks
The Apache CarbonData team


[RESULT] Re: [VOTE] Apache CarbonData 2.3.1-rc1 release

2023-11-25 Thread Liang Chen
Hi all

PMC vote has passed for Apache Carbondata 2.3.1 release, the result as

below:

+1(binding) :  6(Liang Chen, Jacky , Akash, Kunal, Indhumathi M, Bo Xu)


Thanks all for your vote.

Regards
Liang

Liang Chen  于2023年11月25日周六 09:49写道:

> +1
>
> Regards
> Liang
>
> Indhumathi M  于2023年11月21日周二 19:11写道:
>
>> +1
>>
>> Regards,
>> Indhumathi M
>>
>>
>> On Tue, 21 Nov 2023 at 1:07 AM, Liang Chen 
>> wrote:
>>
>> > Hi
>> >
>> >
>> >
>> > I submit the Apache CarbonData 2.3.1 (RC1) to your vote.
>> >
>> > 1. Key features of this release are highlighted as below..
>> >
>> >  - Fixed performance issues by changing index id.
>> >
>> >  - Fixed Secondary Index till segment level with SI as
>> > datamap, make Secondary Index as a coarse grain Datamap and use SI for
>> > Presto queries
>> >
>> >  - Fixed Exception in loading data with overwrite on
>> > partition table.
>> >
>> >  - Fixed spark integration compile issues.
>> >
>> >  - Fixed Exception in loading data with overwrite on
>> > partition table.
>> >
>> >  - Fixed index issues of "sort_columns"
>> >
>> >  - Fixed index issues about multiple sessions table.
>> >
>> >  - Fixed Exception in loading data with overwrite on
>> > partition table.
>> >
>> >  - Fixed DDM sentence about column query failed.
>> >
>> >
>> >
>> >  2. The tag to be voted upon : apache-carbondata-2.3.1-rc1 :
>> >
>> >
>> https://github.com/apache/carbondata/commit/d326118db86414a7895a76dafebb881ba1e52c1c
>> >
>> >
>> > 3.The artifacts to be voted on are located here:
>> > https://dist.apache.org/repos/dist/dev/carbondata/2.3.1-rc1/
>> >
>> >
>> > 4. A staged Maven repository is available for review at:
>> >
>> >
>> https://repository.apache.org/content/repositories/orgapachecarbondata-1075/
>> >
>> >
>> > 5. Release artifacts are signed with the following key:
>> >
>> > https://people.apache.org/keys/committer/chenliang613.asc
>> >
>> > Please vote on releasing this package as Apache CarbonData 2.3.1,  The
>> vote
>> > will beopen for the next 72 hours and passes if a majority of
>> >
>> > at least three +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache CarbonData 2.3.1
>> >
>> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
>> >
>> > [ ] -1 Do not release this package because...
>> >
>>
>


Re: [VOTE] Apache CarbonData 2.3.1-rc1 release

2023-11-25 Thread Liang Chen
+1

Regards
Liang

Indhumathi M  于2023年11月21日周二 19:11写道:

> +1
>
> Regards,
> Indhumathi M
>
>
> On Tue, 21 Nov 2023 at 1:07 AM, Liang Chen 
> wrote:
>
> > Hi
> >
> >
> >
> > I submit the Apache CarbonData 2.3.1 (RC1) to your vote.
> >
> > 1. Key features of this release are highlighted as below..
> >
> >  - Fixed performance issues by changing index id.
> >
> >  - Fixed Secondary Index till segment level with SI as
> > datamap, make Secondary Index as a coarse grain Datamap and use SI for
> > Presto queries
> >
> >  - Fixed Exception in loading data with overwrite on
> > partition table.
> >
> >  - Fixed spark integration compile issues.
> >
> >  - Fixed Exception in loading data with overwrite on
> > partition table.
> >
> >  - Fixed index issues of "sort_columns"
> >
> >  - Fixed index issues about multiple sessions table.
> >
> >  - Fixed Exception in loading data with overwrite on
> > partition table.
> >
> >  - Fixed DDM sentence about column query failed.
> >
> >
> >
> >  2. The tag to be voted upon : apache-carbondata-2.3.1-rc1 :
> >
> >
> https://github.com/apache/carbondata/commit/d326118db86414a7895a76dafebb881ba1e52c1c
> >
> >
> > 3.The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/carbondata/2.3.1-rc1/
> >
> >
> > 4. A staged Maven repository is available for review at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1075/
> >
> >
> > 5. Release artifacts are signed with the following key:
> >
> > https://people.apache.org/keys/committer/chenliang613.asc
> >
> > Please vote on releasing this package as Apache CarbonData 2.3.1,  The
> vote
> > will beopen for the next 72 hours and passes if a majority of
> >
> > at least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache CarbonData 2.3.1
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> > [ ] -1 Do not release this package because...
> >
>


Discussion : The community plans to clear up the old invalid PRs and some PRs are under WIP for a long time.

2023-11-05 Thread Liang Chen
Dear

The community plans to clear up the old invalid PRs and some PRs are under
WIP for a long time.
Please input your comment, and check the RP list under your name.

Regards


Re: C++ implementation requirement

2023-10-23 Thread Liang Chen
Dear Jacky

Thanks for bring this discussion to community.
good idea , +1 from my side,  C++ implementation cloud get much more
performance improvement.

Regards
Liang

Jacky Li  于2023年10月23日周一 04:47写道:

> Hi community,
>
>
>   I have been using Apache CarbonData with Apache Spark to
> process data for a long time. As data analytics requirements get
> more, there are many emerging engines need to read the same data, which of
> them are mostly in C+. So I am wondering is there a C++ implementation for
> Carbondata reader and writer out there, or if the community is interested
> in implementing one? I can help if someone is already started doing it.
>
>
> Regards,
> Jacky


Dear community

2023-10-19 Thread Liang Chen
As you know, Carbondata as datastore and dataformat already be quite good
and mature.
I want to create the thread via mailing list to open discuss what are the
next milestones of carbondata project?
One proposal from my side: we should consider how to integrate with AI
computing engine?

Regards
Liang


Invite Bo Xu as new release manager . Re: [ANNOUNCE] Bo Xu as new PMC for Apache CarbonData

2023-10-18 Thread Liang Chen
Dear community

I would like to propose Bo xu as new PMC to take charge of next new release.

Regards
Liang

Liang Chen  于2023年4月24日周一 20:57写道:

> *Hi *
>
>
> *We are pleased to announce that Bo Xu as new PMC for Apache CarbonData.*
>
>
> *Congrats to **Bo Xu**!*
>
>
> *Apache CarbonData PMC*
>


[ANNOUNCE] Bo Xu as new PMC for Apache CarbonData

2023-04-24 Thread Liang Chen
*Hi *


*We are pleased to announce that Bo Xu as new PMC for Apache CarbonData.*


*Congrats to **Bo Xu**!*


*Apache CarbonData PMC*


Re: Newbie Question: Challenges with Getting Started

2023-04-06 Thread Liang Chen
Hi

Should be ok, please try it again.

Regards


Sandeep N  于2022年4月18日周一 19:47写道:

> Hi all,
>
> I ran into carbondata and started trying it out.  I am following this page
> https://carbondata.apache.org/quick-start-guide.html.
>
> So far I have downloaded
> apache-carbondata-2.3.0-bin-spark3.1.1-hadoop2.7.2.jar. I am trying to use
> it with Spark 3.1.3 (I imagine the micro version difference should not
> matter).
>
> So far I have tried this with OpenJDK 8 and OpenJDK 11 and in both
> instances, create table works however when I attempt to load data from CSV
> it fails with the exception below. This is a different csv from what is
> called out on that quick-start page.
>
> Here is how I am launching Carbondata
> *spark-shell --conf
> spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars  to the above jar>*
>
> I am getting failures on both OpenJDK 11 and OpenJDK 8. Open JDK 8 fails
> with a segfault. I am running on a MacBook pro. The OpenJDK 11 errors seem
> to indicate that is not supported but JDK 8 seems to crash and exit. Please
> see the errors below, can someone point out what I am doing wrong?
>
> Error on OpenJDK 8 when I try to load data from a CSV file
> ==
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x000104cbd7bb, pid=17765,
> tid=0xbd03
> #
> # JRE version: OpenJDK Runtime Environment (8.0_282) (build
> 1.8.0_282-bre_2021_01_20_16_06-b00)
> # Java VM: OpenJDK 64-Bit Server VM (25.282-b00 mixed mode bsd-amd64
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0x5667bb]
>
> Error on OpenJDK 11 when I try to load data from a CSV file
> ==
> 22/04/17 23:57:06 ERROR CarbonFactDataHandlerColumnar: Error in producer
> java.lang.reflect.InaccessibleObjectException: Unable to make public void
> jdk.internal.ref.Cleaner.clean() accessible: module java.base does not
> "exports jdk.internal.ref" to unnamed module @5115e1e6
> at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
> at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
> at
> java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:198)
> at java.base/java.lang.reflect.Method.setAccessible(Method.java:192)
> at
>
> org.apache.carbondata.core.memory.UnsafeMemoryManager.destroyDirectByteBuffer(UnsafeMemoryManager.java:232)
> at
> org.apache.carbondata.core.datastore.page
> .LVByteBufferColumnPage.ensureMemory(LVByteBufferColumnPage.java:125)
> at
> org.apache.carbondata.core.datastore.page
> .LVByteBufferColumnPage.putBytes(LVByteBufferColumnPage.java:97)
> at
> org.apache.carbondata.core.datastore.page
> .LocalDictColumnPage.putBytes(LocalDictColumnPage.java:139)
> at
>
> org.apache.carbondata.core.datastore.page.ColumnPage.putData(ColumnPage.java:413)
> at
>
> org.apache.carbondata.processing.store.TablePage.convertToColumnarAndAddToPages(TablePage.java:241)
> at
> org.apache.carbondata.processing.store.TablePage.addRow(TablePage.java:201)
> at
>
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processDataRows(CarbonFactDataHandlerColumnar.java:397)
> at
>
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.access$500(CarbonFactDataHandlerColumnar.java:60)
> at
>
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:637)
> at
>
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:614)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Error on OpenJDK 11 when I try to insert a single record
> ==
> 22/04/18 09:46:10 ERROR CarbonFactDataHandlerColumnar: Error in producer
> java.lang.reflect.InaccessibleObjectException: Unable to make public void
> jdk.internal.ref.Cleaner.clean() accessible: module java.base does not
> "exports jdk.internal.ref" to unnamed module @cc4787f
> at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
> at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
> at
> java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:198)
> at java.base/java.lang.reflect.Method.setAccessible(Method.java:192)
> at
>
> org.apache.carbondata.core.memory.UnsafeMemoryManager.destroyDirectByteBuffer(UnsafeMemoryManager.java:232)
> at
> org.apache.carbondata.core.datastore.page
> .LVByteBufferColumnPage.ensureMemory(LVByteBufferColumnPage.java:125)

Re: Error while creating table

2023-04-06 Thread Liang Chen
The user group is  u...@carbondata.apache.org

Regards

Xinyu Zeng  于2022年4月25日周一 11:13写道:

> Hi,
>
> Since there is no user group, I am using this email list to ask
> questions. Please let me know if there are other platforms for users
> to discuss.
>
> I am new to CarbonData and am following the quick start guide. On
> Ubuntu 20.04, I installed spark-3.1.1-bin-hadoop2.7.tgz and
> apache-carbondata-2.3.0-bin-spark3.1.1-hadoop2.7.2.jar. By using
> SparkSQL CLI, I got an error message while following the quick start
> guide(at the end of this email). Could someone give me some help?
> Thanks!
>
> Shawn
>
> java.lang.IncompatibleClassChangeError: class
> org.apache.spark.sql.hive.CarbonRelation has interface
> org.apache.spark.sql.catalyst.plans.logical.LeafNode as super class
> at java.base/java.lang.ClassLoader.defineClass1(Native Method)
> at
> java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
> at
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
> at java.base/java.net
> .URLClassLoader.defineClass(URLClassLoader.java:555)
> at java.base/java.net
> .URLClassLoader$1.run(URLClassLoader.java:458)
> at java.base/java.net
> .URLClassLoader$1.run(URLClassLoader.java:452)
> at java.base/java.security.AccessController.doPrivileged(Native
> Method)
> at java.base/java.net
> .URLClassLoader.findClass(URLClassLoader.java:451)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
> at
> org.apache.spark.sql.hive.CarbonMetaStoreFactory$.createCarbonMetaStore(CarbonMetaStore.scala:189)
> at org.apache.spark.sql.CarbonEnv.init(CarbonEnv.scala:137)
> at org.apache.spark.sql.CarbonEnv$.getInstance(CarbonEnv.scala:176)
> at
> org.apache.spark.sql.parser.CarbonExtensionSqlParser.parsePlan(CarbonExtensionSqlParser.scala:44)
> at
> org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616)
> at
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
> at
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616)
> at
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> java.lang.IncompatibleClassChangeError: class
> org.apache.spark.sql.hive.CarbonRelation has interface
> org.apache.spark.sql.catalyst.plans.logical.LeafNode as 

Discussion : the Jenkins CI of ASF is not working well , any suggestion ? we need to change a new free CI

2023-04-06 Thread Liang Chen
Dear

The Jenkins CI of ASF is not working well , any suggestion ?  we need to
change a new free CI


How a DBS Data Platform Drives Real-time Insights & Analytics using Apache CarbonData:

2023-02-12 Thread Liang Chen
Dear

How a DBS Data Platform Drives Real-time Insights & Analytics using Apache
CarbonData:
Please find the link :
https://www.youtube.com/watch?v=cDYkmwMoCEA=1779s

Regards
Liang


Re: CI/CD and Carbondata build

2022-12-03 Thread Liang Chen
Dear JB

Great!Thanks for your contribution.

Regards
Liang

Jean-Baptiste Onofré  于2022年12月3日周六 07:59写道:

> Thanks guys.
>
> I will resume my work on this.
>
> I will keep you posted asap.
>
> Regards
> JB
>
> On Fri, Dec 2, 2022 at 7:11 AM Akash r  wrote:
> >
> > +1
> >
> > Regards,
> > Akash R N
> >
> >
> > On Fri, 25 Nov 2022 at 10:20 AM, Jean-Baptiste Onofré 
> wrote:
> >>
> >> Hi guys
> >>
> >> No objections ?
> >>
> >> I will then move forward with proposed plan.
> >>
> >> Regards
> >> JB
> >>
> >> Le ven. 18 nov. 2022 à 07:46, Jean-Baptiste Onofré  a
> >> écrit :
> >>
> >> > Hi guys
> >> > I started to work on the build and Apache Jenkins.
> >> >
> >> > I have couple of proposals:
> >> >
> >> > 1. I will replace the "static" Jenkins configuration by a pipeline
> >> > using Jenkinsfile directly in CarbonData repo. It's easier to manage,
> >> > anyone can update the Jenkinsfile (adding steps, etc).
> >> > 2. The carbondata-format module has some prerequisite to build: it
> >> > needs thrift installed in the PATH for instance. It could be an issue
> >> > on Jenkins. I propose to manually deploy a SNAPSHOT for format for
> >> > now, I will check if I can have thrift on the Jenkins workers.
> >> >
> >> > Any objections?
> >> >
> >> > Thanks,
> >> > Regards
> >> > JB
> >> >
>


Re: CI/CD and Carbondata build

2022-12-01 Thread Liang Chen
+1
Thanks for JB.

Regards
Liang


Jean-Baptiste Onofré  于2022年11月25日周五 05:50写道:

> Hi guys
>
> No objections ?
>
> I will then move forward with proposed plan.
>
> Regards
> JB
>
> Le ven. 18 nov. 2022 à 07:46, Jean-Baptiste Onofré  a
> écrit :
>
> > Hi guys
> > I started to work on the build and Apache Jenkins.
> >
> > I have couple of proposals:
> >
> > 1. I will replace the "static" Jenkins configuration by a pipeline
> > using Jenkinsfile directly in CarbonData repo. It's easier to manage,
> > anyone can update the Jenkinsfile (adding steps, etc).
> > 2. The carbondata-format module has some prerequisite to build: it
> > needs thrift installed in the PATH for instance. It could be an issue
> > on Jenkins. I propose to manually deploy a SNAPSHOT for format for
> > now, I will check if I can have thrift on the Jenkins workers.
> >
> > Any objections?
> >
> > Thanks,
> > Regards
> > JB
> >
>


[ANNOUNCE] Brijoo as new PMC for Apache CarbonData

2022-09-24 Thread Liang Chen
Hi


We are pleased to announce that Brijoo as new PMC for Apache CarbonData.


Congrats to Brijool!


Apache CarbonData PMC


Re: Correct way to ramp up on Carbondata

2022-05-12 Thread Liang Chen
Hi

First, i suggest that you can install one IntelliJ IDEA to run these
examples :
https://github.com/apache/carbondata/tree/master/examples/spark/src/main/scala/org/apache/carbondata/benchmark


Second : the below is my test script , please take it as reference:

carbon_jar=./carbonlib/$(ls -1 carbonlib |grep "^apache-carbondata.*\.jar$")
./bin/spark-shell --master local --jars ${carbon_jar} --driver-memory 4G

Regards
Liang

Xinyu Zeng  于2022年5月9日周一 15:01写道:

> Hi, I am new to Carbondata and trying to get my hands on but it seems
> like the doc is really frustrating. I encountered several issues
> including this one on
> jira:https://issues.apache.org/jira/browse/CARBONDATA-4334.
>
> My core question is, I simply want to test CarbonData performance
> locally without HDFS. Following the Quick Start guide, it seems that I
> can only do this via spark-shell? Currently I put my commands in a
> scala file and execute
>
> spark-shell --conf
> spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars
> ./apache-carbondata-2.3.0-bin-spark2.3.4-hadoop2.7.2.jar -i
> mini_test.scala
>
>
> But this seems a little stupid, and I do not know where should I put
> the configuration file in. Now I can only do the configuration tuning
> inside the mini_test.scala.
>
> Would be grateful if someone can help.
>
> Thanks,
> Shawn
>


Re: Performance Engineering Track CFP for ApacheCon NA New Orleans

2022-04-10 Thread Liang Chen
Hi sharan

Thanks for kindly sharing this information, we will check it.

Regards
Liang

sharanf  于2022年4月7日周四 18:00写道:

> Hi All
>
> I hope that you have already heard that ApacheCon NA is back as a live
> event in New Orleans later this year. You can find out more details
> here: https://apachecon.com/acna2022/
>
> For the first time ever - we will be running a Performance Engineering
> track. So what is Performance Engineering? You can find a definition and
> a track description here: https://s.apache.org/3ykqk
>
> So why are you getting this message? Well - we took a look at all the
> ASF projects that may have an interest in Performance Engineering and
> this projects was on the list :-)
>
> If you are interested in making a submission to this new track then you
> can find a link to the CFP here: https://apachecon.com/acna2022/cfp.html
>
> We are looking forward to receiving your submissions and hopefully
> seeing those of you who can make it to New Orleans in October.
>
> Thanks
> Sharan


[ANNOUNCE] Indhumathi M as new PMC for Apache CarbonData

2022-02-15 Thread Liang Chen
Hi

We are pleased to announce that Indhumathi M as new PMC for Apache
CarbonData.


Congrats to Indhumathi M!


Apache CarbonData PMC


[ANNOUNCE] Vikram Ahuja as new Apache CarbonData committer

2022-02-11 Thread Liang Chen
We are pleased to announce that the PMC has invited Vikram Ahuja as new

Apache CarbonData committer, and the invite has been accepted!


Congrats to Vikram Ahuja and welcome aboard.


Regards

Apache CarbonData PMC


Re: [DISCUSSION] Log4j2 Vulnerability (CVE-2021-44228, CVE-2021-45046,CVE-2021-45105) Analysis

2022-01-09 Thread Liang Chen
Thanks, Indhumathi.
These analysis info would be very helpful for us.

Regards
Liang

On 2021/12/30 12:31:12 Indhumathi M wrote:
> Hello all, this discussion is related to a Log4j2 vulnerability.
> 
> As you may be aware, there has been a critical vulnerability in Log4j2, the
> Java Logging Library,
> 
> that could result in Remote Code Execution (RCE) if an affected version of
> log4j (2.0 <= log4j <= 2.15.0)
> 
> logs an attacker-controlled string value without proper validation. Please
> see more details on CVE-2021-44228
> .
> 
> We currently believe that the Apache CarbonData platform is not impacted.
> Apache CarbonData does not
> 
> directly use a version of log4j known to be affected by the vulnerability.
> We have reviewed the code and
> 
> run the vulnerability tool, as per the tool report, these three
> vulnerabilities (CVE-2021-44228,
> 
> CVE-2021-45046,CVE-2021-45105) are not identified.
> 
> 
> Regards,
> 
> Indhumathi M
> 


Re: [VOTE] Apache CarbonData 2.3.0(RC1) release

2021-12-20 Thread Liang Chen
-1 from my side.
We need to consider these PR : 4240,4241,4242,4243

Regards
Liang

Kunal Kapoor  于2021年12月20日周一 下午9:39写道:

> Hi All,
>
> I submit the Apache CarbonData 2.3.0(RC1) for your vote.
>
>
> *1.Release Notes:*
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349262=Html=12320220=Create_token=A5KQ-2QAV-T4JA-FDED_e7564140ee4c259084ecff7746af846d0c968ea9_lin
>
> *Some key features and improvements in this release:*
>
>- Support spatial index creation using data frame
>- Upgrade prestosql to 333 version
>- Support Carbondata Streamer tool to fetch data incrementally and merge
>- Support DPP for carbon filters
>- Alter support for complex types
>
>  *2. The tag to be voted upon* : apache-carbondata-2.3.0-rc1
> 
>
> Commit: 70065894d02ce2e898b1ed3cd7b0b10f6305db44
> <
> https://github.com/apache/carbondata/commit/70065894d02ce2e898b1ed3cd7b0b10f6305db44
> >
>
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.3.0-rc1/
>
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1072
>
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
>
>
> Please vote on releasing this package as Apache CarbonData 2.3.0,  The
> vote will
> be open for the next 72 hours and passes if a majority of at least three +1
> PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 2.3.0
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Kunal Kapoor
>


Apache carbondata topics at APACHECON ASIA 2021

2021-08-26 Thread Liang Chen
1. How a DBS Data Platform Drives Real-time Insights & Analytics using
Apache CarbonData:
https://www.youtube.com/watch?v=cDYkmwMoCEA

2.Faster Bigdata Analytics By Maneuvering Apache Carbondata’S Indexes:
https://www.youtube.com/watch?v=aXSsN1eITs0


Re: [VOTE] Apache CarbonData 2.2.0(RC2) release

2021-08-04 Thread Liang Chen
+1

Regards
Liang

Ajantha Bhat  于2021年8月3日周二 下午1:07写道:

> +1
>
> Regards,
> Ajantha
>
> On Mon, Aug 2, 2021 at 9:03 PM Venkata Gollamudi 
> wrote:
>
> > +1
> >
> > Regards,
> > Venkata Ramana
> >
> > On Mon, 2 Aug, 2021, 20:18 Kunal Kapoor, 
> wrote:
> >
> > > +1
> > >
> > > Regards
> > > Kunal Kapoor
> > >
> > > On Mon, 2 Aug 2021, 4:53 pm Kumar Vishal, 
> > > wrote:
> > >
> > > > +1
> > > > Regards
> > > > Kumar Vishal
> > > >
> > > > On Mon, 2 Aug 2021 at 2:28 PM, Indhumathi M  >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Regards,
> > > > > Indhumathi M
> > > > >
> > > > > On Mon, Aug 2, 2021 at 12:33 PM Akash Nilugal <
> > akashnilu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I submit the Apache CarbonData 2.2.0(RC2) for your vote.
> > > > > >
> > > > > >
> > > > > > *1.Release Notes:*
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347869=Html=12320220=Create_token=A5KQ-2QAV-T4JA-FDED_d44fca7058ab2c2a2a4a24e02264cc701f7d10b8_lin
> > > > > >
> > > > > >
> > > > > > *Some key features and improvements in this release:*
> > > > > >- Integrate with Apache Spark-3.1
> > > > > >- Leverage Secondary Index till segment level with SI as
> datamap
> > > and
> > > > > SI
> > > > > > with plan rewrite
> > > > > >- Make Secondary Index as a coarse grain datamap and use
> > secondary
> > > > > > indexes for Presto queries
> > > > > >- Support rename SI table
> > > > > >- Support describe column
> > > > > >- Local sort Partition Load and Compaction improvement
> > > > > >- GeoSpatial Query Enhancements
> > > > > >- Improve the table status and segment file writing
> > > > > >- Improve the carbon CDC performance and introduce APIs to
> > UPSERT,
> > > > > > DELETE, UPDATE and DELETE
> > > > > >- Improvements clean file and rename performance
> > > > > >
> > > > > > *2. The tag to be voted upon:* apache-carbondata-2.2.0-rc2
> > > > > >
> > > https://github.com/apache/carbondata/tree/apache-carbondata-2.2.0-rc2
> > > > > >
> > > > > > Commit: c3a908b51b2f590eb76eb4f4d875cd568dbece40
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/carbondata/commit/c3a908b51b2f590eb76eb4f4d875cd568dbece40
> > > > > >
> > > > > >
> > > > > > *3. The artifacts to be voted on are located here:*
> > > > > > https://dist.apache.org/repos/dist/dev/carbondata/2.2.0-rc2
> > > > > >
> > > > > > *4. A staged Maven repository is available for review at:*
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1071/
> > > > > >
> > > > > >
> > > > > > Please vote on releasing this package as Apache CarbonData 2.2.0,
> > > The
> > > > > vote
> > > > > > will be open for the next 72 hours and passes if a majority of at
> > > least
> > > > > > three +1
> > > > > > PMC votes are cast.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache CarbonData 2.2.0
> > > > > >
> > > > > > [ ] 0 I don't feel strongly about it, but I'm okay with the
> release
> > > > > >
> > > > > > [ ] -1 Do not release this package because...
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Akash R Nilugal
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Apache CarbonData 2.2.0(RC1) release

2021-07-07 Thread Liang Chen
-1, there are some open issues need be considered.

Regards
Liang

Kumar Vishal  于2021年7月6日周二 下午6:14写道:

> -1
> Pls consider pr 4148
> Regards
> Kumar Vishal
>
> On Tue, 6 Jul 2021 at 12:45 PM, Akash Nilugal 
> wrote:
>
> > Hi All,
> >
> > I submit the *Apache CarbonData 2.2.0(RC1) *for your vote.
> >
> >
> >
> > *1. Release Notes:*
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347869=Html=12320220=Create_token=A5KQ-2QAV-T4JA-FDED_386c7cf69a9d53cc8715137e7dba91958dabef9b_lin
> >
> > *Some key features and improvements in this release:*
> >
> >- Integrate Carbondata with spark-3.1
> >- Leverage Secondary Index till segment level with SI as datamap and
> SI
> > with plan rewrite
> >- Make Secondary Index as a coarse grain datamap and use secondary
> > indexes for Presto queries
> >- Support rename SI table
> >- Local sort Partition Load and Compaction improvement
> >- GeoSpatial Query Enhancements
> >- Improve the table status and segment file writing
> >
> > *2. The tag to be voted upon*: apache-carbondata-2.2.0-rc1
> > 
> >
> > *Commit: *d4e5d2337164b34fa19a42a40c03da26ff65ab9e
> > <
> >
> >
> https://github.com/apache/carbondata/commit/d4e5d2337164b34fa19a42a40c03da26ff65ab9e
> > >
> >
> >
> > *3. The artifacts to be voted on are located here:*
> > https://dist.apache.org/repos/dist/dev/carbondata/2.2.0-rc1/
> >
> > *4. A staged Maven repository is available for review at:*
> >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1070/
> >
> >
> > Please vote on releasing this package as Apache CarbonData 2.2.0,  The
> > vote will
> > be open for the next 72 hours and passes if a majority of at least three
> +1
> > PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache CarbonData 2.2.0
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> > [ ] -1 Do not release this package because...
> >
> >
> > Regards,
> > Akash R Nilugal
> >
>


[ANNOUNCE] Akash R Nilugal as new PMC for Apache CarbonData

2021-04-11 Thread Liang Chen
Hi


We are pleased to announce that Akash R Nilugal as new PMC for Apache
CarbonData.


Congrats to Akash R Nilugal!


Apache CarbonData PMC


Re: Looking to contribute to carbondata

2021-04-05 Thread Liang Chen
Hi Pratyaksh

Welcome to Apache CarbonData community, we will discuss with you and help
you quickly to be familiar with CarbonData. 
one suggestion : please first join in dev mailing list and check the quick
start document.

Regards
Liang

Pratyaksh Sharma wrote
> Hi everyone,
> 
> I am looking to contribute to this project. I tried going through the
> jiras
> but could not find any jira with label 'newBie' or something similar. So
> just wanted to check if we have any such label that a new contributor can
> use to search basic tasks and get started?
> 
> If not, can someone point me to some appropriate jira so that I may pick
> it
> up? Any leads are appreciated.
> 
> My jira id - pratyakshsharma.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-29 Thread Liang Chen
+1(binding)

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Apply to open 'Issues' tab in Apache CarbonData github

2021-03-28 Thread Liang Chen
Hi All

The issues of github table has been enabled.

Regards
Liang




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Apply to open 'Issues' tab in Apache CarbonData github

2021-03-18 Thread Liang Chen
I will apply Issues table via INFRA

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: DISCUSSION: propose to activate "Issues" of https://github.com/apache/carbondata

2021-03-18 Thread Liang Chen
Hi

The github issues tab will replace JIRA.

Regards
Liang


Ajantha Bhat wrote
> Hi,
> 
> After opening github issues tab, are we going to stop using JIRA?
> If we keep both, then when to use JIRA and when to use issues?
> 
> Also as we have slack channel now, if user face issues then can directly
> discuss in slack for quick support.
> 
> Thanks,
> Ajantha
> 
> On Thu, 18 Mar, 2021, 5:29 pm Liang Chen, 

> chenliang613@

>  wrote:
> 
>> Hi
>>
>> As you know,  for better managing community, i propose to put "Issues,
>> Pull
>> Request, Code" together and request Apache INFRA to activate "Issues" of
>> github.
>>
>> Open discussion, please input your comments.
>>
>> Regards
>> Liang
>>





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


DISCUSSION: propose to activate "Issues" of https://github.com/apache/carbondata

2021-03-18 Thread Liang Chen
Hi

As you know,  for better managing community, i propose to put "Issues, Pull
Request, Code" together and request Apache INFRA to activate "Issues" of
github.

Open discussion, please input your comments.

Regards
Liang


Re: [VOTE] Apache CarbonData 2.1.1(RC1) release

2021-03-18 Thread Liang Chen
-1, need to fix some major defects

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Ajantha as new PMC for Apache CarbonData

2020-11-19 Thread Liang Chen
Hi


We are pleased to announce that Ajantha as new PMC for Apache CarbonData.


Congrats to Ajantha!



The Apache CarbonData PMC


Re: [VOTE] Apache CarbonData 2.1.0(RC2) release

2020-11-05 Thread Liang Chen
+1(binding)

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANN] Indhumathi as new Apache CarbonData committer

2020-10-06 Thread Liang Chen
Hi


We are pleased to announce that the PMC has invited Indhumathi as new

Apache CarbonData committer, and the invite has been accepted!


Congrats to Indhumathi and welcome aboard.


Regards

The Apache CarbonData PMC


Re: [VOTE] Apache CarbonData 2.1.0(RC1) release

2020-10-06 Thread Liang Chen
Hi

-1 from my side,  there are some open PRs need to be considered.

Regards
Liang



kunalkapoor wrote
> Hi All,
> 
> I submit the Apache CarbonData 2.1.0(RC1) for your vote.
> 
> 
> *1.Release Notes:*
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347868==12320220=Create_token=A5KQ-2QAV-T4JA-FDED_e759c117bdddcf70c718e535d9f3cea7e882dda3_lout
> 
> *Some key features and improvements in this release:*
> 
>- Support Float and Decimal in the Merge Flow
>- Implement delete and update feature in carbondata SDK.
>- Support array
> 
>  with SI
>- Support IndexServer with Presto Engine
>- Insert from stage command support partition table.
>- Implementing a new Reindex command to repair the missing SI Segments
>- Support Change Column Comment
> 
>  *2. The tag to be voted upon* : apache-carbondata-2.1.0-rc1
> https://github.com/apache/carbondata/tree/apache-carbondata-2.1.0-rc1;
> 
> Commit: acef2998bcdd10204cdabf0dcdb123bbd264f48d
> https://github.com/apache/carbondata/commit/acef2998bcdd10204cdabf0dcdb123bbd264f48d;
> 
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.1.0-rc1/
> 
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1064
> 
> 
> Please vote on releasing this package as Apache CarbonData 2.1.0,
> The vote will be open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache CarbonData 2.1.0
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Kunal Kapoor





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Regarding Carbondata Benchmarking & Feature presentation

2020-09-22 Thread Liang Chen
Hi

Great.
Happy to see more and more companies  use Apache CarbonData.

Regards
Liang


Vimal Das Kammath wrote
> Hi Carbondata Team,
> 
> I am working on proposing Carbondata to the Data Analytics team in Uber.
> It
> will be great if any of you can share the latest benchmarking and
> feature/design presentation.
> 
> Regards,
> Vimal





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [discuss]CarbonData update operation enhance

2020-09-22 Thread Liang Chen
Hi

Thank you started this discussion.
This proposal is for improving data updation performance, right ?

Regards
Liang


Linwood wrote
> *[Background]*
> Update operation will clean up delta files before update( see
> cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
> and segment path many times. When there are too many files, the overhead
> will increase and update time will be longer.
> 
> *[Motivation & Goal]*
> During the update process, reduce loop traversal or remove
> cleanUpDelteFiles
> to another method.
> 
> *[Modification]*
> There are some solutions as following.
> 
> Solution 1:
> 
> In cleanUpDeltaFiles have some same points in get files method, like
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
> allSegmentFiles,true) and
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
> allSegmentFiles,true), They are just different file types,but loop
> traversal
> segment path twice. we can merge it.
> 
> Solution 2:
> 
> Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.
> 
> Solution 3:
> 
> Submit cleanUpDelaFiles  to another task, process them in the early
> morning
> or when the cluster is not busy.
> 
> Solution 4:
> 
> Establish a garbage collection bin, which provides some interfaces for our
> program to determine when files enter the garbage collection bin and how
> to
> deal with them.
> 
> Please vote for all solutions.
> 
> Best Regards,
> LinWood
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Apache CarbonData 2.0 release webinar(3rd June, 19:30-21:00 China time)

2020-06-03 Thread Liang Chen

 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 2.0.1(RC1) release

2020-06-01 Thread Liang Chen
+1(binding)

Regards
Liang


kunalkapoor wrote
> Hi All,
> 
> I submit the Apache CarbonData 2.0.1(RC1) for your vote.
> 
> 
> *1.Release Notes:*
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12347870
> 
>  *2. The tag to be voted upon* :
> apache-carbondata-2.0.1-rc1
> https://github.com/apache/carbondata/tree/apache-carbondata-2.0.1-rc1;
> 
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.1-rc1/
> 
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1063/
> 
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
> 
> Please vote on releasing this package as Apache CarbonData 2.0.1,
> The vote will be open for the next 4 hours because this is a patch release
> and passes if a majority of at least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache CarbonData 2.0.1
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Kunal Kapoor





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Apache CarbonData 2.0 release webinar(3rd June, 19:30-21:00 China time)

2020-05-29 Thread Liang Chen
Hi All

Apache CarbonData 2.0 release webinar(3rd June, 19:30-21:00 China time) at
:
https://www.slidestalk.com/m/191


2.0 Release notes :
https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2.0.0+Release

Please sign up and join us for discussion.

Regards
Liang


Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-18 Thread Liang Chen
+1(binding) from myside.

Regards
Liang


kunalkapoor wrote
> Hi All,
> 
> I submit the Apache CarbonData 2.0.0(RC3) for your vote.
> 
> 
> *1.Release Notes:*
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346046=Html=12320220
> 
> *Some key features and improvements in this release:*
> 
>- Adapt to SparkSessionExtensions
>- Support integration with spark 2.4.5
>- Support heterogeneous format segments in carbondata
>- Support write Flink streaming data to Carbon
>- Insert from stage command support partition table.
>- Support secondary index on carbon table
>- Support query of stage files
>- Support TimeBased Cache expiration using ExpiringMap
>- Improve insert into performance and decrease memory foot print
>- Support PyTorch and TensorFlow
> 
>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc3
> https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc3;
> 
> Commit: 29d78b78095ad02afde750d89a0e44f153bcc0f3
> https://github.com/apache/carbondata/commit/29d78b78095ad02afde750d89a0e44f153bcc0f3;
> 
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/
> 
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1062/
> 
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
> 
> 
> Please vote on releasing this package as Apache CarbonData 2.0.0,
> The vote will be open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache CarbonData 2.0.0
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Kunal Kapoor





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Carbon over-use cluster resources

2020-04-14 Thread Liang Chen
OK, thank you feedbacked this issue, let us look into it.

Regards
Liang


Manhua Jiang wrote
> Hi All,
> Recently, I found carbon over-use cluster resources. Generally the design
> of carbon work flow does not act as common spark task which only do one
> small work in one thread, but the task has its mind/logic.
> 
> For example,
> 1.launch carbon with --num-executors=1 but set
> carbon.number.of.cores.while.loading=10;
> 2.no_sort table with multi-block input, N Iterator
> 
>  for example, carbon will start N tasks in parallel. And in each task the
> CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C)
> in ProducerPool. Totally launch N*C threads; ==>This is the case makes me
> take this as serious problem. To many threads stucks the executor to send
> heartbeat and be killed.
> 
> So, the over-use is related to usage of threadpool.
> 
> This would affect the cluster overall resource usage and may lead to wrong
> performance results.
> 
> I hope this get your notice while fixing or writing new codes.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-02 Thread Liang Chen
Hi

Thanks for preparing 2.0.0.
For rc1, my comment is : -1 (binding)
The following of open issues should be considerred in 2.0.0:

https://github.com/apache/carbondata/pull/3675
https://github.com/apache/carbondata/pull/3687
https://github.com/apache/carbondata/pull/3682
https://github.com/apache/carbondata/pull/3691
https://github.com/apache/carbondata/pull/3689
https://github.com/apache/carbondata/pull/3686
https://github.com/apache/carbondata/pull/3683
https://github.com/apache/carbondata/pull/3676
https://github.com/apache/carbondata/pull/3690
https://github.com/apache/carbondata/pull/3688
https://github.com/apache/carbondata/pull/3639
https://github.com/apache/carbondata/pull/3659
https://github.com/apache/carbondata/pull/3669

Regards
Liang

kunalkapoor wrote
> Hi All,
> 
> I submit the Apache CarbonData 2.0.0(RC1) for your vote.
> 
> 
> *1.Release Notes:*
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12346046
> 
> *Some key features and improvements in this release:*
> 
>- Adapt to SparkSessionExtensions
>- Support integration with spark 2.4.5
>- Support heterogeneous format segments in carbondata
>- Support write Flink streaming data to Carbon
>- Insert from stage command support partition table.
>- Support secondary index on carbon table
>- Support query of stage files
>- Support TimeBased Cache expiration using ExpiringMap
>- Improve insert into performance and decrease memory foot print
> 
>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc1
> https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc1;
> 
> Commit: a906785f73f297b4a71c8aaeabae82ae690fb1c3
> https://github.com/apache/carbondata/commit/a906785f73f297b4a71c8aaeabae82ae690fb1c3;
> )
> 
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc1/
> 
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1060/
> 
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
> 
> 
> Please vote on releasing this package as Apache CarbonData 2.0.0,  The
> vote will
> be open for the next 72 hours and passes if a majority of at least three
> +1
> PMC votes are cast.
> 
> [ ] +1 Release this package as Apache CarbonData 2.0.0
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Kunal Kapoor





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-03-29 Thread Liang Chen
Hi


We are pleased to announce that Kunal Kapoor as new PMC for Apache
CarbonData.


Congrats to Kunal Kapoor!


Apache CarbonData PMC


[ANNOUNCE] Tao Li as new Apache CarbonData committer

2020-03-06 Thread Liang Chen
Hi

We are pleased to announce that the PMC has invited Tao Li as new
Apache CarbonData
committer and the invite has been accepted!

Congrats to Tao Li and welcome aboard.

Regards
On behalf of Apache CarbonData PMC


[ANNOUNCE] Zhi Liu as new Apache CarbonData committer

2020-03-06 Thread Liang Chen
Hi


We are pleased to announce that the PMC has invited Zhi Liu as new
Apache CarbonData
committer and the invite has been accepted!

Congrats to Zhi Liu and welcome aboard.

Regards
On behalf of Apache CarbonData PMC


Re: Apply to open 'Issues' tab in Apache CarbonData github

2019-12-22 Thread Liang Chen
Hi

+1 from my side.
One question : what issues should be raised to Apache JIRA? what issues will
be raised to github's issue ?
It is better to give the clear definition.

Regards
Liang


xm_zzc wrote
> Hi community:
>   I suggest community to open 'Issues' tab in carbondata github page, we
> can use this feature to collect the information of carbondata users, like
> this: https://github.com/apache/incubator-shardingsphere/issues/234 ,
> users can add company information which uses carbondata on it willingly
> and we can add these info in Carbondata website.
>   what do you think about this?





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: DISCUSSION: removing support for Spark 2.1 and 2.2 in CarbonData 2.0

2019-12-22 Thread Liang Chen
OK from my side, +1

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.6.1(RC1) release

2019-10-14 Thread Liang Chen
+1

Please update the release notes accordingly.

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Ajantha as new Apache CarbonData committer

2019-10-03 Thread Liang Chen
Hi


We are pleased to announce that the PMC has invited Ajantha as new Apache
CarbonData committer and the invite has been accepted!

Congrats to Ajantha and welcome aboard.

Regards

Apache CarbonData PMC


[ANNOUNCE] Zhichao Zhang as new PMC for Apache CarbonData

2019-08-28 Thread Liang Chen
Hi


We are pleased to announce that Zhichao Zhang as new PMC for Apache
CarbonData.


Congrats to Zhichao Zhang.



Regards

Apache CarbonData PMC


[ANNOUNCE] Manhua Jiang as new Apache CarbonData committer

2019-08-28 Thread Liang Chen
Hi


We are pleased to announce that the PMC has invited Manhua Jiang as new

Apache CarbonData committer and the invite has been accepted!


Congrats to Manhua Jiang and welcome aboard.


Regards

Apache CarbonData PMC


Re: [VOTE] Apache CarbonData 1.6.0(RC3) release

2019-08-18 Thread Liang Chen
+1 from my side

regards
Liang

在 2019年8月13日星期二,Raghunandan S  写道:

> Hi
>
>
> I submit the Apache CarbonData 1.6.0 (RC3) for your vote.
>
>
> 1.Release Notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220=12344965
>
>
> Some key features and improvements in this release:
>
>
>1. Supported Index Server to distribute the index cache and parallelise
> the
> index pruning.
>
>2. Supported incremental data loading on MV datamaps and stabilised MV.
>
>3. Supported Arrow format from Carbon SDK.
>
>4. Supported read from Hive.
>
>
>
>
> [Behaviour Changes]
>
>1. None
>
>
>  2. The tag to be voted upon : apache-CarbonData-1.6.0-rc3 (commit:
>
> 4729b4ccee18ada1898e27f130253ad06497f1fb)
>
> https://github.com/apache/carbondata/releases/tag/
> apache-CarbonData-1.6.0-rc3
> /
>
>
>
> 3. The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/carbondata/1.6.0-rc3/
>
>
>
> 4. A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/
> orgapachecarbondata-1055/
>
>
>
> 5. Release artifacts are signed with the following key:
>
>
> *https://people.apache.org/keys/committer/raghunandan.asc*
>
>
>
> Please vote on releasing this package as Apache CarbonData 1.6.0,  The vote
>
>
> will be open for the next 72 hours and passes if a majority of
>
>
> at least three +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache CarbonData 1.6.0
>
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
>
> [ ] -1 Do not release this package because...
>
>
>
> Regards,
>
> Raghunandan.
>


test , check if the mailing is working ?

2019-07-16 Thread Liang Chen



Re: there are plans to support the spark 2.4?

2019-07-08 Thread Liang Chen
Yes, the carbondata community will consider all stable spark versions for
ecosystem integration.

Regards
Liang

李斌松  于2019年7月8日周一 下午2:43写道:

> there are plans to support the spark 2.4?
>


Re: [VOTE] Apache CarbonData 1.5.4(RC1) release

2019-05-29 Thread Liang Chen
+1

Regards
Liang

manishgupta88 wrote
> +1
> 
> Regards
> Manish Gupta
> 
> On Mon, May 27, 2019 at 11:34 AM kanaka 

> kanakakumaravvaru@

>  wrote:
> 
>> +1
>>
>>
>>
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Akash as new Apache CarbonData committer

2019-04-25 Thread Liang Chen
Hi all

We are pleased to announce that the PMC has invited Akash as new
Apache CarbonData
committer, and the invite has been accepted!

Congrats to Akash and welcome aboard.

Regards
Apache CarbonData PMC


-- 
Regards
Liang


Re: [VOTE] Apache CarbonData 1.5.3(RC1) release

2019-04-08 Thread Liang Chen
+1

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Support for spark 2.1.1 (HDP spark versions)

2019-02-28 Thread Liang Chen
Hi

Did you successfully compile with "mvn -DskipTests -Pspark-2.1
-Dspark.version=2.1.1 clean package" ?
I used the community spark 2.1.1, it is working fine.

Regards
Liang


satish.sidnakoppa wrote
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t389/versions.png;
>  
> 
> The carbon supported spark versions and Hortonworks - HDP supported
> versions
> are different.
> I am unable to integrate carbondata and HDP .
> 
> In brief , how can I build carbondata jar to on spark 2.1.1
> Because I am working on HDP 2.6.2 that has spark 2.1.1 as MAJOR version
> whereas carbondata default support is 2.1.0
> 
> Due to this when I run carbon session creation.I get below error
> 
> val cc =
> SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(hdfs
> path)
> and got error:
> java.lang.NoClassDefFoundError:
> org/apache/spark/sql/catalyst/CatalystConf 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion]read latest schema in case of external table and file format

2019-02-04 Thread Liang Chen
Hi 

Can you explain which scenario will generate two carbondata files with
different schema?

Regards
Liang


akashrn5 wrote
> Hi dev,
> 
> Currently we have a validation that if there are two carbondata files in a
> location with different schema, then we fail the query. I think there is
> no
> need to fail. If you see the parquet behavior also we cna understand.
> 
> Here i think failing is not good, we can read the latets schema from
> latest
> carbondata file in the given location and based on that read all the files
> and give query output. For the columns which are not present in some data
> files, it wil have null values for the new column.
> 
> But here basically we do not merge schema. we can maintain the same now
> also, only thing is can take latest schma.
> 
> for example:
> 1. one data file with columns a,b and c. 2nd file is with columns
> a,b,c,d,e. then can read and create table with 5 columns or 3 columns
> which
> ever is latest and create table(This will be when user does not specify
> schema). If he species table will be created with specified schema.
> 
> I have created a jira for this
> https://issues.apache.org/jira/browse/CARBONDATA-3287
> If any input, please let me know.
> 
> Regards,
> Akash





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.5.2(RC2) release

2019-02-04 Thread Liang Chen
Hi

+1

Regards
Liang


sraghunandan wrote
> Hi
> 
> 
> I submit the Apache CarbonData 1.5.2 (RC2) for your vote.
> 
> 
> 1.Release Notes:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344321
> 
> 
> Some key features and improvements in this release:
> 
> 
>1. Presto Enhancements like supporting Hive metastore and stabilising
> existing Presto features
> 
>2. Supported Range sort for faster data loading and improved point
> query
> performance.
> 
>3. Supported Compaction for no-sort loaded segments
> 
>4. Supported rename of column names
> 
>5. Supported GZIP compressor for CarbonData files.
> 
>6. Supported map data type from DDL.
> 
> 
> [Behavior Changes]
> 
>1. If user doesn’t specify sort columns during table creation, default
> sort scope is set to no-sort during data loading
> 
> 
>  2. The tag to be voted upon : apache-carbondata-1.5.2-rc2 (commit:
> 
> 9e0ff5e4c06fecd2dc9253d6e02093f123f2e71b)
> 
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.2-rc
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.2-rc2;
> 2
> 
> 
> 
> 3. The artifacts to be voted on are located here:
> 
> https://dist.apache.org/repos/dist/dev/carbondata/1.5.2-rc2/
> 
> 
> 
> 4. A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecarbondata-1038/
> 
> 
> 
> 5. Release artifacts are signed with the following key:
> 
> 
> *https://people.apache.org/keys/committer/raghunandan.asc*
> 
> 
> 
> Please vote on releasing this package as Apache CarbonData 1.5.2,  The
> vote
> 
> 
> will be open for the next 72 hours and passes if a majority of
> 
> 
> at least three +1 PMC votes are cast.
> 
> 
> 
> [ ] +1 Release this package as Apache CarbonData 1.5.2
> 
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> 
> [ ] -1 Do not release this package because...
> 
> 
> 
> Regards,
> 
> Raghunandan.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: New git: https://gitbox.apache.org/repos/asf?p=carbondata.git

2019-01-12 Thread Liang Chen
Hi

Please use the below script : 
git remote set-url apache https://gitbox.apache.org/repos/asf/carbondata

Regards
Liang

Liang Chen wrote
> Hi
> 
> The below is my steps, please take it as reference : 
> 
> ChenLiangs-MAC:mergepr apple$ git remote -v
> apachehttps://git-wip-us.apache.org/repos/asf/carbondata (fetch)
> apachehttps://git-wip-us.apache.org/repos/asf/carbondata (push)
> githubhttps://github.com/apache/carbondata (fetch)
> githubhttps://github.com/apache/carbondata (push)
> 
> ChenLiangs-MAC:mergepr apple$ git remote set-url apache
> https://gitbox.apache.org/repos/asf/carbondata
> ChenLiangs-MAC:mergepr apple$ git remote -v
> apachehttps://gitbox.apache.org/repos/asf/carbondata (fetch)
> apachehttps://gitbox.apache.org/repos/asf/carbondata (push)
> githubhttps://github.com/apache/carbondata (fetch)
> githubhttps://github.com/apache/carbondata (push)
> 
> 
> Liang Chen-2 wrote
>> Hi all
>> 
>> Please update your local git with the new address(
>> https://gitbox.apache.org/repos/asf?p=carbondata.git), otherwise, you
>> can't
>> push new PR.
>> 
>> Regards
>> Liang
> 
> 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: New git: https://gitbox.apache.org/repos/asf?p=carbondata.git

2019-01-12 Thread Liang Chen
Hi

The below is my steps, please take it as reference : 

ChenLiangs-MAC:mergepr apple$ git remote -v
apache  https://git-wip-us.apache.org/repos/asf/carbondata (fetch)
apache  https://git-wip-us.apache.org/repos/asf/carbondata (push)
github  https://github.com/apache/carbondata (fetch)
github  https://github.com/apache/carbondata (push)

ChenLiangs-MAC:mergepr apple$ *git remote set-url apache
https://gitbox.apache.org/repos/asf?p=carbondata.git*
ChenLiangs-MAC:mergepr apple$ git remote -v
apache  https://gitbox.apache.org/repos/asf?p=carbondata.git (fetch)
apache  https://gitbox.apache.org/repos/asf?p=carbondata.git (push)
github  https://github.com/apache/carbondata (fetch)
github  https://github.com/apache/carbondata (push)


Liang Chen-2 wrote
> Hi all
> 
> Please update your local git with the new address(
> https://gitbox.apache.org/repos/asf?p=carbondata.git), otherwise, you
> can't
> push new PR.
> 
> Regards
> Liang





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


New git: https://gitbox.apache.org/repos/asf?p=carbondata.git

2019-01-12 Thread Liang Chen
Hi all

Please update your local git with the new address(
https://gitbox.apache.org/repos/asf?p=carbondata.git), otherwise, you can't
push new PR.

Regards
Liang


Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-11 Thread Liang Chen
Hi all

Thank all of your votes.

@JB, thanks for your kind help, I will create one INFRA ticket, and request
the migration time this weekend 

Regards
Liang


Liang Chen wrote
> Hi all,
> 
> Background :
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/NOTICE-Mandatory-migration-of-git-repositories-to-gitbox-apache-org-td72614.html
> 
> Apache CarbonData git repository is in git-wip-us server and it will be
> decommissioned, ASF infra is proposing to move to gitbox. This discussion
> is
> for getting consensus, please discuss and vote.
> 
> Regards
> Liang
> 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-04 Thread Liang Chen
Hi all,

Background :
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/NOTICE-Mandatory-migration-of-git-repositories-to-gitbox-apache-org-td72614.html

Apache Hadoop git repository is in git-wip-us server and it will be
decommissioned, ASF infra is proposing to move to gitbox. This discussion is
for getting consensus, please discuss and vote.

Regards
Liang




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Chuanyin Xu as new PMC for Apache CarbonData

2019-01-01 Thread Liang Chen
Hi

We are pleased to announce that Chuanyin Xu as new PMC for Apache CarbonData
.

Congrats to Chuanyin Xu!

Apache CarbonData PMC


Re: [DISCUSSION] Optimize the properties documentation or comments

2018-12-15 Thread Liang Chen
+1

Regards
Liang

xubo245 wrote
> Optimize the properties documentation or comments:
> Some properties have not documentation or comments, which will not easy to
> understand for user.
> We should add properties documentation or comments.
> 
> Unify documentation:
> Some properties have not documentation or comments in code such as
> org.apache.carbondata.core.constants.CarbonCommonConstants , but it has
> some
> documentation or comments on .md file, so we should unify it.
> 
> JIRA:
> https://issues.apache.org/jira/browse/CARBONDATA-3170
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-15 Thread Liang Chen
Hi

First, let me understand your propoal,you mean : 
1, If user defines the "sort_columns=columns" : all behaviors are same as
the current, no any change.(most of users will set this key option during
create carbondata table)
2, If user doesn't define the "sort_columns" : current default behavior: all
the dimension columns are selected for sort_columns, sort_scope is
local_sort :  *you propose to change this default behavior,use the no_sort,
right ?*

if yes, I agree with this proposal. and propose to remove "empty
sort_column" option. *it would be more easy for users to understand.  If
define the sort_column, use the local_sort, if don't define the sort_column,
use the no_sort.*

Regards
Liang


Ajantha Bhat wrote
> Hi all,
> Currently in carbondata, we have 'local_sort' as default sort_scope and by
> default, all the dimension columns are selected for sort_columns.
> This will slow down the data loading.
> *To give the best performance benefit to user by default values, *
> we can change sort_scope to 'no_sort' and stop using all dimensions for
> sort_columns by default.
> Also if sort_columns are specified but sort_scope is not specified by the
> user, implicitly need to consider scort_scope as 'local_sort'.
> These default values are applicable for carbonsession, spark file format
> and SDK also. (all will have the same behavior)
> 
> With these changes below is the performance results of TPCH queries on
> 500GB data
> 
> 
> 
> ** Load time is improved nearly by 4 times. * total Query time by all
> queries is improved. (50% of queries are faster with no_sort, other 50%
> queries are slightly degraded or same. overall better performance)*
> Also when I did this change, I found few major issues from existing code
> in
> 'no_sort' and empty sort_columns flow. I have fixed that also.
> Below are the issues found,
> 
> 
> 
> 
> *[CARBONDATA-3162] Range filters don't remove null values for no_sort
> direct dictionary dimension columns. [CARBONDATA-3163] If table has
> different time format, for no_sort columns data goes as bad record (null)
> for second table when loaded after first table.[CARBONDATA-3164] During
> no_sort, exception happened at converter step is not reaching to user.
> same
> problem in SDK and spark file format flow also.Also fixed multiple test
> case issues.*
> I have already opened a PR for fixing these issues.
> https://github.com/apache/carbondata/pull/2966
> 
> Let me know if any suggestions about these changes.
> 
> Thanks,
> Ajantha





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSS] Support transactional table in SDK

2018-12-07 Thread Liang Chen
Hi

Good idea, thank you started this discussion.

Agree with Ravi comments, we need to double-check some limitations after
introducing the feature.

Flink and Kafka integration can be discussed later. 
For using SDK to write new data to the existing carbondata table , some
questions:
1.How to ensure to create the same index, dictionary... policy as per the
existing table?
2.Can you please help me to understand this proposal further : what valued
scenarios require this feature?


After having online segment, one can use this feature to implement
ApacheFlink-CarbonData integration, or Apache
KafkaStream-CarbonDataintegration, or just using SDK to write new data to
existing CarbonData table,the integration level can be the same as current
Spark-CarbonDataintegration.

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[ANNOUNCE] Bo Xu as new Apache CarbonData committer

2018-12-07 Thread Liang Chen
Hi all

We are pleased to announce that the PMC has invited Bo Xu as new
Apache CarbonData
committer, and the invite has been accepted!

Congrats to Bo Xu and welcome aboard.

Regards
Apache CarbonData PMC


Re: [VOTE] Apache CarbonData 1.5.1(RC2) release

2018-12-03 Thread Liang Chen
Hi

+1

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discuss] Removing search mode

2018-11-06 Thread Liang Chen
Hi

+1, but one suggestion,  in the future we can first try these alpha features
in the separate branch . once it is confirmed, then merge into master.

Regards
Liang


akashrn5 wrote
> +1
> yes, after search mode implementation we didnt get much advantage as
> expected and simply code will be complex, i agree with likun.
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.5.0(RC2) release

2018-10-10 Thread Liang Chen
+1

Regards
Liang

Ravindra Pesala  于2018年10月10日周三 上午3:15写道:

> Hi
>
> I submit the Apache CarbonData 1.5.0 (RC2) for your vote.
>
> 1.Release Notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12341006
>
> Some key features and improvements in this release:
>
>1. Supported carbon as Spark datasource using Spark's
>Fileformat interface.
>2. Supported Spark 2.3.2 version
>3. Supported Hadoop 3.1.1 version
>4. Improved compression and performance of non dictionary columns by
>applying an adaptive encoding to them,
>5. Supported MAP datatype in carbon.
>6. Supported ZSTD compression for carbondata files.
>7. Supported C++ interfaces to read carbon through SDK API.
>8. Supported CLI tool for data summary and debug purpose.
>9. Supported BYTE and FLOAT datatypes in carbon.
>10. Limited min/max for large text by introducing a configurable limit
>to avoid file size bloat up.
>11. Introduced multithread write API in SDK to speed up loading and
>query performance.
>12. Supported min/max stats for stream row format to improve query
>performance.
>13. Many Bug fixes and stabilized carbondata.
>
>
>  2. The tag to be voted upon : apache-carbondata-1.5.0.rc2(commit:
> 935cf3a5291a12a39f8c68b32157e26b8b1ef92b)
>
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.0-rc2
>
>
> 3. The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/carbondata/1.5.0-rc2/
>
>
> 4. A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecarbondata-1034
>
>
> 5. Release artifacts are signed with the following key:
>
> *https://people.apache.org/keys/committer/ravipesala.asc
> <
> https://link.getmailspring.com/link/1524823736.local-38e60b2f-d8f4-v1.2.1-7e744...@getmailspring.com/9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.asc=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D
> >*
>
>
> Please vote on releasing this package as Apache CarbonData 1.5.0,  The vote
>
> will be open for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache CarbonData 1.5.0
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Ravindra.
>


[ANNOUNCE] Raghunandan as new committer of Apache CarbonData

2018-09-26 Thread Liang Chen
Hi all

We are pleased to announce that the PMC has invited Raghunandan as new
committer of Apache CarbonData, and the invite has been accepted!

Congrats to Raghunandan and welcome aboard.

Regards
Apache CarbonData PMC


Re: [VOTE] Apache CarbonData 1.5.0(RC1) release

2018-09-26 Thread Liang Chen
Hi

I would like to see these PRs be merged(2761, 2759)
.
Regards
Liang

ravipesala wrote
> Hi
> 
> I submit the Apache CarbonData 1.5.0 (RC1) for your vote.
> 
> 1.Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12341006
> 
> Some key features and improvements in this release:
> 
>1. Supported carbon as Spark datasource using Spark's
>Fileformat interface.
>2. Improved compression and performance of non dictionary columns by
>applying an adaptive encoding to them,
>3. Supported MAP datatype in carbon.
>4. Supported ZSTD compression for carbondata files.
>5. Supported C++ interfaces to read carbon through SDK API.
>6. Supported CLI tool for data summary and debug purpose.
>7. Supported BYTE and FLOAT datatypes in carbon.
>8. Limited min/max for large text by introducing a configurable limit
> to
>avoid file size bloat up.
>9. Introduced multithread write API in SDK to speed up loading and
> query
>performance.
>10. Supported min/max stats for stream row format to improve query
>performance.
>11. Many Bug fixes and stabilized carbondata.
> 
> 
>  2. The tag to be voted upon : apache-carbondata-1.5.0.rc1(commit:
> 2157741f1d8cf3f0418ab37e1755a8d4167141a5)
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.0-rc1
> 
> 
> 3. The artifacts to be voted on are located here:
> 
> https://dist.apache.org/repos/dist/dev/carbondata/1.5.0-rc1/
> 
> 
> 4. A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecarbondata-1033
> 
> 
> 5. Release artifacts are signed with the following key:
> 
> *https://people.apache.org/keys/committer/ravipesala.asc
> https://link.getmailspring.com/link/

> 1524823736.local-38e60b2f-d8f4-v1.2.1-7e7447b6@

> /9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.ascrecipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D*
> 
> 
> Please vote on releasing this package as Apache CarbonData 1.5.0,  The
> vote
> 
> will be open for the next 72 hours and passes if a majority of
> 
> at least three +1 PMC votes are cast.
> 
> 
> [ ] +1 Release this package as Apache CarbonData 1.5.0
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Ravindra.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: CarbonData Performance Optimization

2018-09-24 Thread Liang Chen
Hi

+1, great proposal, very expect to see your pull request.

Regards
Liang

ravipesala wrote
> Hi,
> 
> In case of querying data using Spark  or Presto, carbondata is not well
> optimized for reading data and fill the vector. The major issues are as
> follows.
> 1. CarbonData has long method stack for reading and filling out the data
> to
> vector.
> 2. Many conditions and checks before filling out the data to vector.
> 3. Maintaining intermediate copies of data leads more CPU utilization.
> Because of the above issues, there is a high chance of missing the CPU
> cache while processing the leads to poor performance.
> 
> So here I am proposing the optimization to fill the vector without much
> method stack and condition checks and no intermediate copies to utilize
> more CPU cache.
> 
> *Full Scan queries:*
>   After decompressing the page in our V3 reader we can immediately fill
> the
> data to a vector without any condition checks inside loops. So here
> complete column page data is set to column vector in a single batch and
> gives back data to Spark/Presto.
> *Filter Queries:*
>   First, apply page level pruning using the min/max of each page and get
> the valid pages of blocklet.  Decompress only valid pages and fill the
> vector directly as mentioned in full scan query scenario.
> 
> In this method, we can also get the advantage of avoiding two times
> filtering in Spark/Presto as they do the filtering again even though we
> return the filtered data.
> 
> Please find the *TPCH performance report of updated carbon* as per the
> changes mentioned above. Please note that the changes I have done the
> changes in POC quality so it takes some time to stabilize it.
> 
> *Configurations*
> Laptop with i7 processor and 16 GB RAM.
> TPCH Data Scale: 100 GB
> No Sort with no inverted index data.
> Total CarbonData Size : 32 GB
> Total Parquet Size :  31 GB
> 
> 
> Queries Parquet Carbon New Carbon Old Carbon Old vs Carbon New Carbon New
> Vs Parquet Carbon old Vs Parquet
> Q1 101 96 128 25.00% 4.95% -26.73%
> Q2 85 82 85 3.53% 3.53% 0.00%
> Q3 118 112 135 17.04% 5.08% -14.41%
> Q4 473 424 486 12.76% 10.36% -2.75%
> Q5 228 201 205 1.95% 11.84% 10.09%
> Q6 19.2 19.2 48 60.00% 0.00% -150.00%
> Q7 194 181 198 8.59% 6.70% -2.06%
> Q8 285 263 275 4.36% 7.72% 3.51%
> Q9 362 345 363 4.96% 4.70% -0.28%
> Q10 101 92 93 1.08% 8.91% 7.92%
> Q11 64 61 62 1.61% 4.69% 3.13%
> Q12 41.4 44 63 30.16% -6.28% -52.17%
> Q13 43.4 43.6 43.7 0.23% -0.46% -0.69%
> Q14 36.9 31.5 41 23.17% 14.63% -11.11%
> Q15 70 59 80 26.25% 15.71% -14.29%
> Q16 64 60 64 6.25% 6.25% 0.00%
> Q17 426 418 432 3.24% 1.88% -1.41%
> Q18 1015 921 1001 7.99% 9.26% 1.38%
> Q19 62 53 59 10.17% 14.52% 4.84%
> Q20 406 326 426 23.47% 19.70% -4.93%
> Full Scan Query* 140 116 164 29.27% 17.14% -17.14%
> *Full Scan Query means count of every coumn of lineitem, In this way we
> can
> check the full scan query performance.
> 
> The above optimization is not just limited to fileformat and Presto
> integration but also improves for CarbonSession integration.
> We can further optimize carbon by the tasks(Vishal is already working on
> it) like adaptive encoding for all types of columns and storing length and
> values in separate pages in case of string datatype.Please refer
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Carbondata-Store-size-optimization-td62283.html
> .
> 
> -- 
> Thanks & Regards,
> Ravi





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Updates to CarbonData documentation and structure

2018-09-04 Thread Liang Chen
Hi Raghu

+1, all these optimizations are very good.

Regards
Liang


sraghunandan wrote
> Dear All,
> 
>  I wanted to propose some updates and changes to our current
> documentation,Please let me know your inputs and comments.
> 
> 
> 1.Split Our carbondata command into DDL and DML
> 
> 2.Add Presto and Hive integration along with Spark into quick start
> 
> 3.Add a master reference manual which lists all the commands supported in
> carbondata.This manual shall have links to DDL and DML supported
> 
> 4.Add a introduction to carbondata covering architecture,design and
> features supported
> 
> 5.Merge FAQ and troubleshooting documents into single document
> 
> 6.Add a separate md file to explain user how to navigate across our
> documentation
> 
> 7.Add the TOC (Table of Contents) to all the md files which has multiple
> sections
> 
> 8.Add list of supported properties at the beginning of each DDL or DML so
> that user knows all the properties that are supported
> 
> 9.Rewrite the configuration properties description to explain the property
> in bit more detail and also highlight when to use the command and any
> caveats
> 
> 10.ReOrder our configuration properties table to group features wise
> 
> 11.Update our webpage(carbondata.apache.org) to have a better navigation
> for documentation section
> 
> 12.Add use cases about carbondata usage and performance tuning tips
> 
> 
> Regards
> 
> Raghu





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: error occur when I load data to s3

2018-09-03 Thread Liang Chen
Hi kunal

Can you list all S3 issues PR, we may need to give 1.4.2 patch release.
Because aaron plan to use carbondata in production this month.

To arron : First please you try master, see if can solve your problems.

Regards
Liang

kunalkapoor wrote
> Hi aaron,
> Many issues like this have been identified in 1.4 version. Most of the
> issues have been fixed in the master code and will be released in 1.5
> version.
> Remaing fixes are in progress.
> Can you try the same scenario in 1.5(master branch).
> 
> Thanks
> Kunal Kapoor
> 
> On Mon, Sep 3, 2018, 5:57 AM aaron <

> 949835961@

>> wrote:
> 
>> *update the aws-java-sdk and hadoop-aws to below version, then
>> authorization
>> works.
>> com.amazonaws:aws-java-sdk:1.10.75.1,org.apache.hadoop:hadoop-aws:2.7.3*
>>
>> *But we still can not load data, the exception is same.
>> carbon.sql("LOAD DATA INPATH
>> 'hdfs://localhost:9000/usr/carbon-s3/sample.csv' INTO TABLE
>> test_s3_table")*
>>
>> 18/09/02 21:49:47 ERROR CarbonLoaderUtil: main Unable to unlock Table
>> lock
>> for tabledefault.test_s3_table during table status updation
>> 18/09/02 21:49:47 ERROR CarbonLoadDataCommand: main
>> java.lang.ArrayIndexOutOfBoundsException
>> at java.lang.System.arraycopy(Native Method)
>> at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
>> at
>> org.apache.hadoop.fs.s3a.S3AOutputStream.write(S3AOutputStream.java:164)
>> at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at
>>
>> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:111)
>> at
>>
>> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStreamUsingAppend(S3CarbonFile.java:93)
>> at
>>
>> org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStreamUsingAppend(FileFactory.java:276)
>> at
>> org.apache.carbondata.core.locks.S3FileLock.lock(S3FileLock.java:96)
>> at
>>
>> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:41)
>> at
>>
>> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:59)
>> at
>>
>> org.apache.carbondata.processing.util.CarbonLoaderUtil.recordNewLoadMetadata(CarbonLoaderUtil.java:247)
>> at
>>
>> org.apache.carbondata.processing.util.CarbonLoaderUtil.recordNewLoadMetadata(CarbonLoaderUtil.java:204)
>> at
>>
>> org.apache.carbondata.processing.util.CarbonLoaderUtil.readAndUpdateLoadProgressInTableMeta(CarbonLoaderUtil.java:437)
>> at
>>
>> org.apache.carbondata.processing.util.CarbonLoaderUtil.readAndUpdateLoadProgressInTableMeta(CarbonLoaderUtil.java:446)
>> at
>>
>> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:263)
>> at
>>
>> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:92)
>> at
>>
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>> at
>>
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>> at
>>
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>> at org.apache.spark.sql.Dataset.
> 
> (Dataset.scala:183)
>> at
>>
>> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:107)
>> at
>>
>> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:96)
>> at
>> org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:154)
>> at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:94)
>> at
>>
>> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :34)
>> at
>>
>> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :39)
>> at
>> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :41)
>> at
>> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :43)
>> at
>> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :45)
>> at $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :47)
>> at $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :49)
>> at $line25.$read$$iw$$iw$$iw$$iw$$iw.
> 
> (
> 
> :51)
>> at $line25.$read$$iw$$iw$$iw$$iw.
> 
> (
> 
> :53)
>> at $line25.$read$$iw$$iw$$iw.
> 
> (
> 
> :55)
>> at $line25.$read$$iw$$iw.
> 
> (
> 
> :57)
>> at $line25.$read$$iw.
> 
> (
> 
> :59)
>> at $line25.$read.
> 
> (
> 
> :61)
>> at $line25.$read$.
> 
> (
> 
> :65)
>> at $line25.$read$.
> 
> (
> 
> )
>> at $line25.$eval$.$print$lzycompute(
> 
> :7)
>> at $line25.$eval$.$print(
> 
> :6)
>>  

Re: [DISCUSSION] Implement file-level Min/Max index for streaming segment

2018-08-26 Thread Liang Chen
Hi

+1 for this proposal.

Regards
Liang


David CaiQiang wrote
> Hi All,
> Currently, the filter queries on the streaming table always scan all
> streaming files, even though there are no data in streaming files that
> meet
> the filter conditions.
> So I try to support file-level min/max index on streaming segment. It
> helps to reduce the task number and improve the performance of filter scan
> in some cases.
> Please check the document in JIRA:  
> https://issues.apache.org/jira/browse/CARBONDATA-2853
> https://issues.apache.org/jira/browse/CARBONDATA-2853;  
> Any question, suggestion?
> 
> 
> 
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Support Standard Spark's FileFormat interface in Carbondata

2018-08-23 Thread Liang Chen
HI

+1, agree to support standard spark file format interface in carbondata, it
will be significantly helpful for broadening apache carbondata's ecosystem.

Regards
Liang


ravipesala wrote
> Hi,
> 
> Current Carbondata has deep integration with Spark to provide
> optimizations
> in performance and also supports features like compaction, IUD, data maps
> and metadata management etc. This type of integration forces user to use
> CarbonSession instance to use carbon even for read and write operations.
> 
> So I am proposing standard spark's FileFormat implementation in carbon for
> simple integration with Spark. Please check the jira for the design
> document.
> https://issues.apache.org/jira/browse/CARBONDATA-2872
> 
> -- 
> Thanks & Regards,
> Ravindra





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-08-21 Thread Liang Chen
Hi

1. Agree with likun's comments(4 points) : 

2. About 'select sql' for CTAS , you can leave it. we can consider it later.

Regards
Liang

Jacky Li wrote
> Hi ZZC,
> 
> I have checked the doc in CARBONDATA-2595. I have following comments:
> 1. In the Table Basic Information section, it is better to print the Table
> Path instead of "CARBON Store Path”
> 2. For the Table Data Size  and Index Size, can you format the output in
> GB, MB, KB, etc
> 3. For the Last Update Time, can you format the output in UTC time like
> -MM-DD hh:mm:ss
> 4. In table property, I think maybe some properties are missing, like
> block size, blocklet size, long string
> 
> For implementation, I suggest to write the main logic of collecting these
> information in java so that it is easier to write tools for it. One tool
> can be this SQL command and another tool I can think of is an standalone
> java executable that  can print these information on the screen by reading
> the given table path. (We can put this standalone tool in SDK module)
> 
> Regards,
> Jacky
> 
> 
>> 在 2018年8月20日,上午11:20,xm_zzc <

> 441586683@

>> 写道:
>> 
>> Hi dev:
>>  Now I am working on this, the new format is shown in attachment, please
>> give me some feedback.
>>  There is one question: if user uses CTAS to create table, do we need to
>> show the 'select sql' in the result of 'desc formatted table'? If yes,
>> how
>> to get 'select sql'? now I just can get a non-formatted sql from
>> 'CarbonSparkSqlParser.scala' (Jacky mentioned), for example:
>> 
>> *CREATE TABLE IF NOT EXISTS test_table
>> STORED BY 'carbondata'
>> TBLPROPERTIES(
>> 'streaming'='false', 'sort_columns'='id,city',
>> 'dictionary_include'='name')
>> AS SELECT * from source_test ;*
>> 
>> The non-formatted sql I get is :
>> *SELECT*fromsource_test*
>> 
>> desc_formatted.txt
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t133/desc_formatted.txt;
>>   
>> desc_formatted_external.txt
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t133/desc_formatted_external.txt;
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: How to look up date segment details in carbon without partition.

2018-08-13 Thread Liang Chen
Hi 

In Carbondata system, the segment concept may be different with other
system.
One data load is one segment for carbondata.

Actually, carbondata currently support partition with global sort also, you
can use date as partition column to check data size for under each partition
folder.

Regards
Liang

carbondata-newuser wrote
> It seems carbondata not recommend use partitionby and partitionby is not
> supported in global sort scope.
> It is very conveniently to look up how many date partition(along with the
> partition size every day) already exists in hive(save as parquet).
> In carbondata I add the date column to first sort columns in order to
> using
> global sort scope.
> But how can I look up segment corresponding date and size in carbondata.
>  
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.4.1(RC2) release

2018-08-13 Thread Liang Chen
Hi 

+1. many good improvements and bug fixs.

Regards
Liang 


ravipesala wrote
> Hi
> 
> 
> I submit the Apache CarbonData 1.4.1 (RC2) for your vote.
> 
> 
> 1.Release Notes:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12343148
> 
> Some key features and improvements in this release:
> 
>1. Supported Local dictionary to improve IO and query performance.
>2. Improved and stabilized Bloom filter datamap.
>3. Supported left outer join MV datamap(Alpha feature)
>4. Supported driver min max caching for specified columns and
>segregate block and blocklet cache.
>5. Support Flat folder structure in carbon to maintain the same folder
>structure as Hive.
>6. Supported S3 read and write on carbondata files
>7. Support projection push down for struct data type.
>8. Improved complex datatypes compression and performance through
>adaptive encoding.
>9. Many Bug fixes and stabilized carbondata.
> 
> 
>  2. The tag to be voted upon : apache-carbondata-1.4.1.rc2(commit:
> a17db2439aa51f6db7da293215f9732ffb200bd9)
> 
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.4.1-rc2
> 
> 
> 3. The artifacts to be voted on are located here:
> 
> https://dist.apache.org/repos/dist/dev/carbondata/1.4.1-rc2/
> 
> 
> 4. A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecarbondata-1032
> 
> 
> 5. Release artifacts are signed with the following key:
> 
> *https://people.apache.org/keys/committer/ravipesala.asc
> https://link.getmailspring.com/link/

> 1524823736.local-38e60b2f-d8f4-v1.2.1-7e7447b6@

> /9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.ascrecipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D*
> 
> 
> Please vote on releasing this package as Apache CarbonData 1.4.1,  The
> vote
> 
> will be open for the next 72 hours and passes if a majority of
> 
> at least three +1 PMC votes are cast.
> 
> 
> [ ] +1 Release this package as Apache CarbonData 1.4.1
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Ravindra.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Propose to upgrade the version of integration/presto from 0.187 to 0.206

2018-08-13 Thread Liang Chen
Hi

Just i checked 0.207 and 0.208, there are fixed many memory issues, so
propose to upgrade to 0.208 for Apache CarbonData 1.5.0 integration.

Regards
Liang


bhavya411 wrote
> Hi Dev,
> 
> Yes, we should definitely go for the 0.206 upgrade for Presto as we are
> now
> using the dictionary_aggregation feature for optimization. The other bug
> fixes are also important for carbondata integration.
> However, they have changed the connector interface as well, so we might
> need to change our interface accordingly.
> 
> Thanks and regards
> Bhavya
> 
> On Tue, Jul 24, 2018 at 2:11 PM, Liang Chen 

> chenliang6136@

>  wrote:
> 
>> Hi Dev
>>
>> The presto community already released 0.206 last week (refer the detail
>> at
>> https://prestodb.io/docs/current/release/release-0.206.html),  this
>> release
>> fixed many issues, so propose Apache CarbonData community to upgrade to
>> the
>> latest presto version for carbondata integration.
>>
>> please provide your opinion.
>>
>> Regards
>> Liang
>>
> 
> 
> 
> -- 
> *Bhavya Aggarwal*
> CTO & Partner
> Knoldus Inc. http://www.knoldus.com/;
> +91-9910483067
> Canada - USA - India - Singapore
> https://in.linkedin.com/company/knoldus;
> https://twitter.com/Knolspeak;
> https://www.facebook.com/KnoldusSoftware/;
> https://blog.knoldus.com/;





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.4.1(RC1) release

2018-07-31 Thread Liang Chen
Hi

These PR, it is better to merge in 1.4.1 also
https://github.com/apache/carbondata/pull/2588
https://github.com/apache/carbondata/pull/2565

Regards
Liang


ravipesala wrote
> Hi
> 
> 
> I submit the Apache CarbonData 1.4.1 (RC1) for your vote.
> 
> 
> 1.Release Notes:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12343148
> 
> Some key features and improvements in this release:
> 
>1. Supported Local dictionary to improve IO and query performance.
>2. Improved and stabilized Bloom filter datamap.
>3. Supported left outer join MV datamap(Alpha feature)
>4. Supported driver min max caching for specified columns and segregate
>block and blocklet cache.
>5. Support Flat folder structure in carbon to maintain the same folder
>structure as Hive.
>6. Supported S3 read and write on carbondata files
>7. Support projection push down for struct data type.
>8. Improved complex datatypes compression and performance through
>adaptive encoding.
>9. Many Bug fixes and stabilized carbondata.
> 
> 
>  2. The tag to be voted upon : apache-carbondata-1.4.1.rc1(commit:
> 93b4bfc039791894e1bf24b4f340a6c2a29b0e60)
> 
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.4.1.rc1
> 
> 
> 3. The artifacts to be voted on are located here:
> 
> https://dist.apache.org/repos/dist/dev/carbondata/1.4.1-rc1/
> 
> 
> 4. A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecarbondata-1031/
> 
> 
> 5. Release artifacts are signed with the following key:
> 
> *https://people.apache.org/keys/committer/ravipesala.asc
> https://link.getmailspring.com/link/

> 1524823736.local-38e60b2f-d8f4-v1.2.1-7e7447b6@

> /9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.ascrecipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D*
> 
> 
> Please vote on releasing this package as Apache CarbonData 1.4.1,  The
> vote
> 
> will be open for the next 72 hours and passes if a majority of
> 
> at least three +1 PMC votes are cast.
> 
> 
> [ ] +1 Release this package as Apache CarbonData 1.4.1
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Ravindra.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.4.1(RC1) release

2018-07-31 Thread Liang Chen
ravipesala wrote
> Hi
> 
> 
> I submit the Apache CarbonData 1.4.1 (RC1) for your vote.
> 
> 
> 1.Release Notes:
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12343148
> 
> Some key features and improvements in this release:
> 
>1. Supported Local dictionary to improve IO and query performance.
>2. Improved and stabilized Bloom filter datamap.
>3. Supported left outer join MV datamap(Alpha feature)
>4. Supported driver min max caching for specified columns and segregate
>block and blocklet cache.
>5. Support Flat folder structure in carbon to maintain the same folder
>structure as Hive.
>6. Supported S3 read and write on carbondata files
>7. Support projection push down for struct data type.
>8. Improved complex datatypes compression and performance through
>adaptive encoding.
>9. Many Bug fixes and stabilized carbondata.
> 
> 
>  2. The tag to be voted upon : apache-carbondata-1.4.1.rc1(commit:
> 93b4bfc039791894e1bf24b4f340a6c2a29b0e60)
> 
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.4.1.rc1
> 
> 
> 3. The artifacts to be voted on are located here:
> 
> https://dist.apache.org/repos/dist/dev/carbondata/1.4.1-rc1/
> 
> 
> 4. A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecarbondata-1031/
> 
> 
> 5. Release artifacts are signed with the following key:
> 
> *https://people.apache.org/keys/committer/ravipesala.asc
> https://link.getmailspring.com/link/

> 1524823736.local-38e60b2f-d8f4-v1.2.1-7e7447b6@

> /9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.ascrecipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D*
> 
> 
> Please vote on releasing this package as Apache CarbonData 1.4.1,  The
> vote
> 
> will be open for the next 72 hours and passes if a majority of
> 
> at least three +1 PMC votes are cast.
> 
> 
> [ ] +1 Release this package as Apache CarbonData 1.4.1
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Ravindra.





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: New-bie JIRAs for new contributors

2018-07-31 Thread Liang Chen
vikashtalanki wrote
> Hi Vikash
> 
> Welcome to Apache CarbonData community.
> 1. Firstly, please let me know your apache jira account(email id), i will
> add you as contributor.
> 2. Secondly,You can run the simple example as per :
> https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md
> 3. You can check the examples and debug ,
> https://github.com/apache/carbondata/tree/master/examples
> 
> 4.  For new contributor : Add new examples, Optimize documents, Write more
> test cases.
> 
> Regards
> Liang
> 
> Hi Team,
> 
> I am very much interested in Carbon Data project and would like to
> contribute to this. I have read the documentation & "How to contribute"
> pages. Also forked the git repo and subscribed to dev mailing lists.
> Since, I'm very much new to this, would like to start by resolving
> new-bie(easy-level) kinda JIRAs to get familiar with the process.
> Hence, I would request you to provide me links to new-bie JIRAs.
> 
> -- 
> Regards...
> 
> Vikash Talanki,
> +1 408.203.2151





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[Discussion] Propose to upgrade the version of integration/presto from 0.187 to 0.206

2018-07-24 Thread Liang Chen
Hi Dev

The presto community already released 0.206 last week (refer the detail at
https://prestodb.io/docs/current/release/release-0.206.html),  this release
fixed many issues, so propose Apache CarbonData community to upgrade to the
latest presto version for carbondata integration.

please provide your opinion.

Regards
Liang


Re: Index file cache will not work when the table has invalid segment.

2018-07-12 Thread Liang Chen
Hi

Currently, CarbonData doesn't support map data type

Regards
Liang


carbondata-newuser wrote
> Carbon version is 1.4 rc2.
> create table(
> col1 string,
> col2 int,
> col2 string,
> date string
> )
> 
> *First step:*
> insert into table carbonTest select col1,col2,col3,"20180707" from
> hiveTable2 where date="20180707";
> The col3 is a hive map type, so this insert will be failed. 
> And it will create invalid segment. (I'm not sure it is because of this).
> *second step:*
> insert into carbonTest select col1,col2,"","20180707" from hiveTable2;
> 
> Then any query to this table will access the index files time and time
> again.
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Carbondata集成Presto的问题请教

2018-06-14 Thread Liang Chen
Hi

Please send your questions to mailing list.(cc to mailing list)
Currently, "presto read streaming carbondata table" is not supporting.
Can you share with the community , why need to support this feature, what
are your exact requirements?

Regards
Liang


kevintop  于2018年6月13日周三 上午9:48写道:

> 陈总 ,
>
> 您好, 现在有个Carbondata集成Presto的问题打扰一下, Presto 可以读Carbondata格式的流式表吗?
>
> 我用Presto查询流式表时,一直抛下面的异常, 我跟了一下,发现在构造QueryModel时,对于流式表确实没有设置 detailInfo
> 属性,异常信息如下。
>
> java.lang.RuntimeException: Could not read blocklet details
> at
> org.apache.carbondata.presto.impl.CarbonLocalInputSplit.convertSplit(CarbonLocalInputSplit.java:131)
> at
> org.apache.carbondata.presto.CarbondataPageSourceProvider.createQueryModel(CarbondataPageSourceProvider.java:139)
>
>
>
> --
> Kevin Kong
>


Updated release notes . Re: [ANNOUNCE] Apache CarbonData 1.4.0 release

2018-06-05 Thread Liang Chen
Hi

Please find the updated 1.4.0 release notes:
https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+1.4.0+Release

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Support updating/deleting data for stream table

2018-06-03 Thread Liang Chen
Hi

+1 for first considering solution1

Regards
Liang

xm_zzc wrote
> Hi  Raghu:
>   Yep, you are right, so I said solution 1 is not very precise when there
> are still some data you want to update/delete being stored in stream
> segments, solution 2 can handle this scenario you mentioned.
>   But, in my opinion, the scenario of deleting historical data is more
> common than the one of updating data, the data size of stream table will
> grow day by day, user generally want to delete specific data to make data
> size not too large, for example, if user want to keep data for one year,
> he
> need to delete one year ago of data everyday. On the other hand, solution
> 2
> is more complicated than solution 1, we need to consider the implement of
> solution 2 in depth.
>   Based on the above reasons, Liang Chen, Jacky, David and I prefered to
> implement Solution 1 first. Is it ok for you?
>   
>   Is there any other suggestion?
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


  1   2   >