[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372786096
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesQueryRollUp.scala
 ##
 @@ -0,0 +1,259 @@
+  /*
+  * Licensed to the Apache Software Foundation (ASF) under one or more
+  * contributor license agreements.  See the NOTICE file distributed with
+  * this work for additional information regarding copyright ownership.
+  * The ASF licenses this file to You under the Apache License, Version 2.0
+  * (the "License"); you may not use this file except in compliance with
+  * the License.  You may obtain a copy of the License at
+  *
+  *http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+
+  package org.apache.carbondata.mv.timeseries
+
+  import org.apache.spark.sql.test.util.QueryTest
+  import org.scalatest.BeforeAndAfterAll
+
+  import org.apache.carbondata.mv.rewrite.TestUtil
+
+  class TestMVTimeSeriesQueryRollUp extends QueryTest with BeforeAndAfterAll {
+
+override def beforeAll(): Unit = {
+  drop()
+  createTable()
+  loadData("maintable")
+}
+
+override def afterAll(): Unit = {
+  drop()
+}
+
+test("test timeseries query rollup with simple projection") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-1") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable group by timeseries(projectjoindate,'day'),projectcode")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable 
group by timeseries(projectjoindate,'hour'),projectcode")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable group by timeseries(projectjoindate,'day'),projectcode")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-2") {
+  val result  = sql("select 
timeseries(projectjoindate,'day'),sum(projectcode) from maintable group by 
timeseries(projectjoindate,'day')")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),sum(projectcode) from 
maintable group by timeseries(projectjoindate,'second')")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),sum(projectcode) from 
maintable group by timeseries(projectjoindate,'hour')")
+  val df =sql("select timeseries(projectjoindate,'day'),sum(projectcode) 
from maintable group by timeseries(projectjoindate,'day')")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with filter") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable where 

[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372785829
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesQueryRollUp.scala
 ##
 @@ -0,0 +1,259 @@
+  /*
+  * Licensed to the Apache Software Foundation (ASF) under one or more
+  * contributor license agreements.  See the NOTICE file distributed with
+  * this work for additional information regarding copyright ownership.
+  * The ASF licenses this file to You under the Apache License, Version 2.0
+  * (the "License"); you may not use this file except in compliance with
+  * the License.  You may obtain a copy of the License at
+  *
+  *http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+
+  package org.apache.carbondata.mv.timeseries
+
+  import org.apache.spark.sql.test.util.QueryTest
+  import org.scalatest.BeforeAndAfterAll
+
+  import org.apache.carbondata.mv.rewrite.TestUtil
+
+  class TestMVTimeSeriesQueryRollUp extends QueryTest with BeforeAndAfterAll {
+
+override def beforeAll(): Unit = {
+  drop()
+  createTable()
+  loadData("maintable")
+}
+
+override def afterAll(): Unit = {
+  drop()
+}
+
+test("test timeseries query rollup with simple projection") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-1") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable group by timeseries(projectjoindate,'day'),projectcode")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable 
group by timeseries(projectjoindate,'hour'),projectcode")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable group by timeseries(projectjoindate,'day'),projectcode")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
 
 Review comment:
   please clone this testcase and create datamap1 only and still query `select 
timeseries(projectjoindate,'day'),projectcode from maintable group by 
timeseries(projectjoindate,'day'),projectcode`, to see wheather it will rollup 
using second level MV


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372785035
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesQueryRollUp.scala
 ##
 @@ -0,0 +1,259 @@
+  /*
+  * Licensed to the Apache Software Foundation (ASF) under one or more
+  * contributor license agreements.  See the NOTICE file distributed with
+  * this work for additional information regarding copyright ownership.
+  * The ASF licenses this file to You under the Apache License, Version 2.0
+  * (the "License"); you may not use this file except in compliance with
+  * the License.  You may obtain a copy of the License at
+  *
+  *http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+
+  package org.apache.carbondata.mv.timeseries
+
+  import org.apache.spark.sql.test.util.QueryTest
+  import org.scalatest.BeforeAndAfterAll
+
+  import org.apache.carbondata.mv.rewrite.TestUtil
+
+  class TestMVTimeSeriesQueryRollUp extends QueryTest with BeforeAndAfterAll {
+
+override def beforeAll(): Unit = {
+  drop()
+  createTable()
+  loadData("maintable")
+}
+
+override def afterAll(): Unit = {
+  drop()
+}
+
+test("test timeseries query rollup with simple projection") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-1") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable group by timeseries(projectjoindate,'day'),projectcode")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable 
group by timeseries(projectjoindate,'hour'),projectcode")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable group by timeseries(projectjoindate,'day'),projectcode")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
 
 Review comment:
   please check wheather MV will hit for following:
   `"select timeseries(projectjoindate,'second'),projectcode from maintable 
group by timeseries(projectjoindate,'second'),projectcode"`
   `"select timeseries(projectjoindate,'second')from maintable group by 
timeseries(projectjoindate,'second')"`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372785035
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesQueryRollUp.scala
 ##
 @@ -0,0 +1,259 @@
+  /*
+  * Licensed to the Apache Software Foundation (ASF) under one or more
+  * contributor license agreements.  See the NOTICE file distributed with
+  * this work for additional information regarding copyright ownership.
+  * The ASF licenses this file to You under the Apache License, Version 2.0
+  * (the "License"); you may not use this file except in compliance with
+  * the License.  You may obtain a copy of the License at
+  *
+  *http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+
+  package org.apache.carbondata.mv.timeseries
+
+  import org.apache.spark.sql.test.util.QueryTest
+  import org.scalatest.BeforeAndAfterAll
+
+  import org.apache.carbondata.mv.rewrite.TestUtil
+
+  class TestMVTimeSeriesQueryRollUp extends QueryTest with BeforeAndAfterAll {
+
+override def beforeAll(): Unit = {
+  drop()
+  createTable()
+  loadData("maintable")
+}
+
+override def afterAll(): Unit = {
+  drop()
+}
+
+test("test timeseries query rollup with simple projection") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-1") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable group by timeseries(projectjoindate,'day'),projectcode")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable 
group by timeseries(projectjoindate,'hour'),projectcode")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable group by timeseries(projectjoindate,'day'),projectcode")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
 
 Review comment:
   please check wheather MV will hit for `"select 
timeseries(projectjoindate,'second'),projectcode from maintable group by 
timeseries(projectjoindate,'second'),projectcode"` also


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372785035
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesQueryRollUp.scala
 ##
 @@ -0,0 +1,259 @@
+  /*
+  * Licensed to the Apache Software Foundation (ASF) under one or more
+  * contributor license agreements.  See the NOTICE file distributed with
+  * this work for additional information regarding copyright ownership.
+  * The ASF licenses this file to You under the Apache License, Version 2.0
+  * (the "License"); you may not use this file except in compliance with
+  * the License.  You may obtain a copy of the License at
+  *
+  *http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software
+  * distributed under the License is distributed on an "AS IS" BASIS,
+  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  * See the License for the specific language governing permissions and
+  * limitations under the License.
+  */
+
+  package org.apache.carbondata.mv.timeseries
+
+  import org.apache.spark.sql.test.util.QueryTest
+  import org.scalatest.BeforeAndAfterAll
+
+  import org.apache.carbondata.mv.rewrite.TestUtil
+
+  class TestMVTimeSeriesQueryRollUp extends QueryTest with BeforeAndAfterAll {
+
+override def beforeAll(): Unit = {
+  drop()
+  createTable()
+  loadData("maintable")
+}
+
+override def afterAll(): Unit = {
+  drop()
+}
+
+test("test timeseries query rollup with simple projection") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
+  checkAnswer(result,df)
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+}
+
+test("test timeseries query rollup with simple projection with group by - 
scenario-1") {
+  val result  = sql("select timeseries(projectjoindate,'day'),projectcode 
from maintable group by timeseries(projectjoindate,'day'),projectcode")
+  sql("drop datamap if exists datamap1")
+  sql("drop datamap if exists datamap2")
+  sql(
+"create datamap datamap1 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode")
+  sql(
+"create datamap datamap2 on table maintable using 'mv' as " +
+"select timeseries(projectjoindate,'hour'),projectcode from maintable 
group by timeseries(projectjoindate,'hour'),projectcode")
+  val df = sql("select timeseries(projectjoindate,'day'),projectcode from 
maintable group by timeseries(projectjoindate,'day'),projectcode")
+  assert(TestUtil.verifyMVDataMap(df.queryExecution.optimizedPlan, 
"datamap2"))
 
 Review comment:
   please check `"select timeseries(projectjoindate,'second'),projectcode from 
maintable group by timeseries(projectjoindate,'second'),projectcode"` also


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372784087
 
 

 ##
 File path: 
datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/modular/ModularPlan.scala
 ##
 @@ -96,6 +96,24 @@ abstract class ModularPlan
 _rewritten
   }
 
+  private var _rolledup: Boolean = false
 
 Review comment:
   this coding style is different, please make it the same as in other place. 
change to `rolledup`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372783817
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala
 ##
 @@ -206,4 +312,59 @@ private[mv] class Navigator(catalog: 
SummaryDatasetCatalog, session: MVSession)
 }
 true
   }
+
+  /**
+   * Replace the granularity in the plan
 
 Review comment:
   please explain more, what is replaced with what?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372783388
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala
 ##
 @@ -66,7 +79,100 @@ private[mv] class Navigator(catalog: 
SummaryDatasetCatalog, session: MVSession)
 }
 }
 if (rewrittenPlan.fastEquals(plan)) {
-  None
+  if (modularPlan.asScala.exists(p => p.sameResult(rewrittenPlan))) {
 
 Review comment:
   please extract new function and add comment, this function is too complex


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372783388
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala
 ##
 @@ -66,7 +79,100 @@ private[mv] class Navigator(catalog: 
SummaryDatasetCatalog, session: MVSession)
 }
 }
 if (rewrittenPlan.fastEquals(plan)) {
-  None
+  if (modularPlan.asScala.exists(p => p.sameResult(rewrittenPlan))) {
 
 Review comment:
   please add comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372783236
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
 ##
 @@ -830,18 +831,186 @@ object MVHelper {
*/
   def rewriteWithMVTable(rewrittenPlan: ModularPlan, rewrite: QueryRewrite): 
ModularPlan = {
 if (rewrittenPlan.find(_.rewritten).isDefined) {
-  val updatedDataMapTablePlan = rewrittenPlan transform {
+  var updatedDataMapTablePlan = rewrittenPlan transform {
 case s: Select =>
   MVHelper.updateDataMap(s, rewrite)
 case g: GroupBy =>
   MVHelper.updateDataMap(g, rewrite)
   }
+  if (rewrittenPlan.rolledUp) {
+// If the rewritten query is rolled up, then rewrite the query based 
on the original modular
+// plan. Make a new outputList based on original modular plan and wrap 
rewritten plan with
+// select & group-by nodes with new outputList.
+
+// For example:
+// Given User query:
+// SELECT timeseries(col,'day') from maintable group by 
timeseries(col,'day')
+// If plan is rewritten as per 'hour' granularity of datamap1,
+// then rewritten query will be like,
+// SELECT datamap1_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+// (projectjoindate, hour)`
+// FROM
+// default.datamap1_table
+// GROUP BY datamap1_table.`UDF:timeseries_projectjoindate_hour`
+//
+// Now, rewrite the rewritten plan as per the 'day' granularity
+// SELECT timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' ) AS
+// `UDF:timeseries(projectjoindate, day)`
+//  FROM
+//  (SELECT datamap2_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+//  (projectjoindate, hour)`
+//  FROM
+//default.datamap2_table
+//  GROUP BY datamap2_table.`UDF:timeseries_projectjoindate_hour`) 
gen_subsumer_0
+// GROUP BY timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' )
+rewrite.modularPlan match {
+  case select: Select =>
+val outputList = select.outputList
+val rolledUpOutputList = 
updatedDataMapTablePlan.asInstanceOf[Select].outputList
+var finalOutputList: Seq[NamedExpression] = Seq.empty
+val mapping = outputList zip rolledUpOutputList
+val newSubsme = rewrite.newSubsumerName()
+
+for ((s, d) <- mapping) {
+  var name: String = getAliasName(d)
+  s match {
+case a@Alias(scalaUdf: ScalaUDF, aliasName) =>
+  if (scalaUdf.function.isInstanceOf[TimeSeriesFunction]) {
+val newName = newSubsme + ".`" + name + "`"
+val transformedUdf = transformTimeSeriesUdf(scalaUdf, 
newName)
+finalOutputList = finalOutputList.:+(Alias(transformedUdf, 
aliasName)(a.exprId,
+  a.qualifier).asInstanceOf[NamedExpression])
+  }
+case Alias(attr: AttributeReference, _) =>
+  finalOutputList = 
finalOutputList.:+(AttributeReference(name, attr
+.dataType)(
+exprId = attr.exprId,
+qualifier = Some(newSubsme)))
+case attr: AttributeReference =>
+  finalOutputList = 
finalOutputList.:+(AttributeReference(name, attr
+.dataType)(
+exprId = attr.exprId,
+qualifier = Some(newSubsme)))
+  }
+}
+val tChildren = new collection.mutable.ArrayBuffer[ModularPlan]()
+val tAliasMap = new collection.mutable.HashMap[Int, String]()
+
+val sel_plan = select.copy(outputList = finalOutputList,
+  inputList = finalOutputList,
+  predicateList = Seq.empty)
+tChildren += sel_plan
+tAliasMap += (tChildren.indexOf(sel_plan) -> newSubsme)
+updatedDataMapTablePlan = select.copy(outputList = finalOutputList,
 
 Review comment:
   use MV instead of DataMapTable, so change to `updatedMVTablePlan`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372783012
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
 ##
 @@ -830,18 +831,186 @@ object MVHelper {
*/
   def rewriteWithMVTable(rewrittenPlan: ModularPlan, rewrite: QueryRewrite): 
ModularPlan = {
 if (rewrittenPlan.find(_.rewritten).isDefined) {
-  val updatedDataMapTablePlan = rewrittenPlan transform {
+  var updatedDataMapTablePlan = rewrittenPlan transform {
 case s: Select =>
   MVHelper.updateDataMap(s, rewrite)
 case g: GroupBy =>
   MVHelper.updateDataMap(g, rewrite)
   }
+  if (rewrittenPlan.rolledUp) {
+// If the rewritten query is rolled up, then rewrite the query based 
on the original modular
+// plan. Make a new outputList based on original modular plan and wrap 
rewritten plan with
+// select & group-by nodes with new outputList.
+
+// For example:
+// Given User query:
+// SELECT timeseries(col,'day') from maintable group by 
timeseries(col,'day')
+// If plan is rewritten as per 'hour' granularity of datamap1,
+// then rewritten query will be like,
+// SELECT datamap1_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+// (projectjoindate, hour)`
+// FROM
+// default.datamap1_table
+// GROUP BY datamap1_table.`UDF:timeseries_projectjoindate_hour`
+//
+// Now, rewrite the rewritten plan as per the 'day' granularity
+// SELECT timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' ) AS
+// `UDF:timeseries(projectjoindate, day)`
+//  FROM
+//  (SELECT datamap2_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+//  (projectjoindate, hour)`
+//  FROM
+//default.datamap2_table
+//  GROUP BY datamap2_table.`UDF:timeseries_projectjoindate_hour`) 
gen_subsumer_0
+// GROUP BY timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' )
+rewrite.modularPlan match {
+  case select: Select =>
+val outputList = select.outputList
+val rolledUpOutputList = 
updatedDataMapTablePlan.asInstanceOf[Select].outputList
+var finalOutputList: Seq[NamedExpression] = Seq.empty
+val mapping = outputList zip rolledUpOutputList
+val newSubsme = rewrite.newSubsumerName()
+
+for ((s, d) <- mapping) {
+  var name: String = getAliasName(d)
+  s match {
+case a@Alias(scalaUdf: ScalaUDF, aliasName) =>
+  if (scalaUdf.function.isInstanceOf[TimeSeriesFunction]) {
+val newName = newSubsme + ".`" + name + "`"
+val transformedUdf = transformTimeSeriesUdf(scalaUdf, 
newName)
+finalOutputList = finalOutputList.:+(Alias(transformedUdf, 
aliasName)(a.exprId,
+  a.qualifier).asInstanceOf[NamedExpression])
+  }
+case Alias(attr: AttributeReference, _) =>
+  finalOutputList = 
finalOutputList.:+(AttributeReference(name, attr
+.dataType)(
+exprId = attr.exprId,
+qualifier = Some(newSubsme)))
+case attr: AttributeReference =>
+  finalOutputList = 
finalOutputList.:+(AttributeReference(name, attr
+.dataType)(
+exprId = attr.exprId,
+qualifier = Some(newSubsme)))
+  }
+}
+val tChildren = new collection.mutable.ArrayBuffer[ModularPlan]()
+val tAliasMap = new collection.mutable.HashMap[Int, String]()
 
 Review comment:
   please give a more readable name for `tChildren` and `tAliasMap`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372782254
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
 ##
 @@ -830,18 +831,186 @@ object MVHelper {
*/
   def rewriteWithMVTable(rewrittenPlan: ModularPlan, rewrite: QueryRewrite): 
ModularPlan = {
 if (rewrittenPlan.find(_.rewritten).isDefined) {
-  val updatedDataMapTablePlan = rewrittenPlan transform {
+  var updatedDataMapTablePlan = rewrittenPlan transform {
 case s: Select =>
   MVHelper.updateDataMap(s, rewrite)
 case g: GroupBy =>
   MVHelper.updateDataMap(g, rewrite)
   }
+  if (rewrittenPlan.rolledUp) {
+// If the rewritten query is rolled up, then rewrite the query based 
on the original modular
+// plan. Make a new outputList based on original modular plan and wrap 
rewritten plan with
+// select & group-by nodes with new outputList.
+
+// For example:
+// Given User query:
+// SELECT timeseries(col,'day') from maintable group by 
timeseries(col,'day')
+// If plan is rewritten as per 'hour' granularity of datamap1,
+// then rewritten query will be like,
+// SELECT datamap1_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+// (projectjoindate, hour)`
+// FROM
+// default.datamap1_table
+// GROUP BY datamap1_table.`UDF:timeseries_projectjoindate_hour`
+//
+// Now, rewrite the rewritten plan as per the 'day' granularity
+// SELECT timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' ) AS
+// `UDF:timeseries(projectjoindate, day)`
+//  FROM
+//  (SELECT datamap2_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+//  (projectjoindate, hour)`
+//  FROM
+//default.datamap2_table
+//  GROUP BY datamap2_table.`UDF:timeseries_projectjoindate_hour`) 
gen_subsumer_0
+// GROUP BY timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' )
+rewrite.modularPlan match {
+  case select: Select =>
+val outputList = select.outputList
+val rolledUpOutputList = 
updatedDataMapTablePlan.asInstanceOf[Select].outputList
+var finalOutputList: Seq[NamedExpression] = Seq.empty
+val mapping = outputList zip rolledUpOutputList
+val newSubsme = rewrite.newSubsumerName()
+
+for ((s, d) <- mapping) {
 
 Review comment:
   ```suggestion
   mapping.foreach { case (attr, index) =>
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support Query Rollup for MV TimeSeries Queries

2020-01-29 Thread GitBox
jackylk commented on a change in pull request #3495: [CARBONDATA-3532] Support 
Query Rollup for MV TimeSeries Queries
URL: https://github.com/apache/carbondata/pull/3495#discussion_r372782254
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
 ##
 @@ -830,18 +831,186 @@ object MVHelper {
*/
   def rewriteWithMVTable(rewrittenPlan: ModularPlan, rewrite: QueryRewrite): 
ModularPlan = {
 if (rewrittenPlan.find(_.rewritten).isDefined) {
-  val updatedDataMapTablePlan = rewrittenPlan transform {
+  var updatedDataMapTablePlan = rewrittenPlan transform {
 case s: Select =>
   MVHelper.updateDataMap(s, rewrite)
 case g: GroupBy =>
   MVHelper.updateDataMap(g, rewrite)
   }
+  if (rewrittenPlan.rolledUp) {
+// If the rewritten query is rolled up, then rewrite the query based 
on the original modular
+// plan. Make a new outputList based on original modular plan and wrap 
rewritten plan with
+// select & group-by nodes with new outputList.
+
+// For example:
+// Given User query:
+// SELECT timeseries(col,'day') from maintable group by 
timeseries(col,'day')
+// If plan is rewritten as per 'hour' granularity of datamap1,
+// then rewritten query will be like,
+// SELECT datamap1_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+// (projectjoindate, hour)`
+// FROM
+// default.datamap1_table
+// GROUP BY datamap1_table.`UDF:timeseries_projectjoindate_hour`
+//
+// Now, rewrite the rewritten plan as per the 'day' granularity
+// SELECT timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' ) AS
+// `UDF:timeseries(projectjoindate, day)`
+//  FROM
+//  (SELECT datamap2_table.`UDF:timeseries_projectjoindate_hour` AS 
`UDF:timeseries
+//  (projectjoindate, hour)`
+//  FROM
+//default.datamap2_table
+//  GROUP BY datamap2_table.`UDF:timeseries_projectjoindate_hour`) 
gen_subsumer_0
+// GROUP BY timeseries(gen_subsumer_0.`UDF:timeseries(projectjoindate, 
hour)`,'day' )
+rewrite.modularPlan match {
+  case select: Select =>
+val outputList = select.outputList
+val rolledUpOutputList = 
updatedDataMapTablePlan.asInstanceOf[Select].outputList
+var finalOutputList: Seq[NamedExpression] = Seq.empty
+val mapping = outputList zip rolledUpOutputList
+val newSubsme = rewrite.newSubsumerName()
+
+for ((s, d) <- mapping) {
 
 Review comment:
   ```suggestion
   mapping.foreach { case (s, d) =>
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3597: [CARBONDATA-3575] Remove redundant exception throws

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3597: [CARBONDATA-3575] Remove redundant 
exception throws
URL: https://github.com/apache/carbondata/pull/3597#issuecomment-580106803
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1800/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3597: [CARBONDATA-3575] Remove redundant exception throws

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3597: [CARBONDATA-3575] Remove redundant 
exception throws
URL: https://github.com/apache/carbondata/pull/3597#issuecomment-579954320
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1799/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk opened a new pull request #3597: [CARBONDATA-3575] Remove redundant exception throws

2020-01-29 Thread GitBox
jackylk opened a new pull request #3597: [CARBONDATA-3575] Remove redundant 
exception throws
URL: https://github.com/apache/carbondata/pull/3597
 
 
### Why is this PR needed?
After 

### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3675) Remove redundant throws

2020-01-29 Thread Jacky Li (Jira)
Jacky Li created CARBONDATA-3675:


 Summary: Remove redundant throws
 Key: CARBONDATA-3675
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3675
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3596: [CARBONDATA-3673] Remove unused declare

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3596: [CARBONDATA-3673] Remove unused declare
URL: https://github.com/apache/carbondata/pull/3596#issuecomment-579842075
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1798/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk opened a new pull request #3596: [CARBONDATA-3673] Remove unused declare

2020-01-29 Thread GitBox
jackylk opened a new pull request #3596: [CARBONDATA-3673] Remove unused declare
URL: https://github.com/apache/carbondata/pull/3596
 
 
### Why is this PR needed?
   After global dictionary, custom partition feature is deprecated, some code 
path is not used. It is better to do refactory.
   
### What changes were proposed in this PR?
   This PR removes unused declarations.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3674) Remove Encoding.DICTIONARY and Encoding.DIRECT_DICTIONARY usage

2020-01-29 Thread Jacky Li (Jira)
Jacky Li created CARBONDATA-3674:


 Summary: Remove Encoding.DICTIONARY and Encoding.DIRECT_DICTIONARY 
usage
 Key: CARBONDATA-3674
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3674
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3673) Remove used declaration

2020-01-29 Thread Jacky Li (Jira)
Jacky Li created CARBONDATA-3673:


 Summary: Remove used declaration
 Key: CARBONDATA-3673
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3673
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3672) Refactory by IDE analyzer

2020-01-29 Thread Jacky Li (Jira)
Jacky Li created CARBONDATA-3672:


 Summary: Refactory by IDE analyzer
 Key: CARBONDATA-3672
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3672
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li


After global dictionary, custom partition feature is deprecated, some code path 
is not used. It is better to do refactory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-579834563
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1797/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-579811317
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1796/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-579763503
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1795/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-579742732
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1794/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3671) Support compress direct bytebuffer in the SNAPPY/ZSTD/GZIP compressor

2020-01-29 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3671:
---

 Summary: Support compress direct bytebuffer in the 
SNAPPY/ZSTD/GZIP compressor
 Key: CARBONDATA-3671
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3671
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Xingjun Hao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2020-01-29 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3670:
---

 Summary: Support compress offheap columnpage directly, avoding a 
copy of data from offhead to heap when compressed.
 Key: CARBONDATA-3670
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
 Project: CarbonData
  Issue Type: Wish
  Components: core
Affects Versions: 2.0.0
Reporter: Xingjun Hao
 Fix For: 2.0.0


When writing data, the columnpages are stored on the offheap,  the pages will 
be compressed to save storage cost. Now, in the compression processing, the 
data will be copied from the offheap to the heap before compressed, which leads 
to heavier GC overhead compared with compress offhead directly.

To sum up, we support compress offheap columnpage directly, avoding a copy of 
data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3583: [WIP] Support CarbonOutputFormat in Hive

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3583: [WIP] Support CarbonOutputFormat in 
Hive 
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-579731431
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1790/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3583: [WIP] Support CarbonOutputFormat in Hive

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3583: [WIP] Support CarbonOutputFormat in 
Hive 
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-579729895
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1793/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-579724633
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1792/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-579722776
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1791/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3595: [WIP] remove Encoding.DICTIONARY and Encoding.DIRECT_DICTIONARY usage

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3595: [WIP] remove Encoding.DICTIONARY and 
Encoding.DIRECT_DICTIONARY usage
URL: https://github.com/apache/carbondata/pull/3595#issuecomment-579718709
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1789/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3595: [WIP] remove Encoding.DICTIONARY and Encoding.DIRECT_DICTIONARY usage

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3595: [WIP] remove Encoding.DICTIONARY and 
Encoding.DIRECT_DICTIONARY usage
URL: https://github.com/apache/carbondata/pull/3595#issuecomment-579685385
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1788/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-29 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-579678870
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1787/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] wangyi-fudan commented on issue #3444: [CARBONDATA-3581] Support page level bloom filter

2020-01-29 Thread GitBox
wangyi-fudan commented on issue #3444: [CARBONDATA-3581] Support page level 
bloom filter
URL: https://github.com/apache/carbondata/pull/3444#issuecomment-579653894
 
 
   just mention wyhash, the fastest high quality conventional hash function:
   https://github.com/wangyi-fudan/wyhash


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services