[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

2021-01-18 Thread GitBox


CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762637383


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3553/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

2021-01-18 Thread GitBox


CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762634833


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5313/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

2021-01-18 Thread GitBox


Karan980 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762604221


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller

2021-01-18 Thread GitBox


QiangCai commented on a change in pull request #4025:
URL: https://github.com/apache/carbondata/pull/4025#discussion_r559874151



##
File path: core/pom.xml
##
@@ -64,6 +64,11 @@
   expiringmap
   0.5.9
 
+
+  com.alibaba
+  fastjson
+  1.2.68

Review comment:
   change to 1.2.75





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #4076: [CARBONDATA-4107] Added mvExists property for MV fact table and added lock while touchMDTFile

2021-01-18 Thread GitBox


QiangCai commented on pull request #4076:
URL: https://github.com/apache/carbondata/pull/4076#issuecomment-762546050


   yes, it can avoid to list all tables of some databases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

2021-01-18 Thread GitBox


nihal0107 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-762305509


   
![before2](https://user-images.githubusercontent.com/32429250/104931386-56d60880-59cc-11eb-9914-c38b30f99c72.png)
   
![after1](https://user-images.githubusercontent.com/32429250/104931415-5f2e4380-59cc-11eb-8dde-5c5f2e46feec.png)
   
   
   @Indhumathi27 please find the comparision between master and current branch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 removed a comment on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

2021-01-18 Thread GitBox


nihal0107 removed a comment on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-762304031


   @Indhumathi27 please find the comparison with master branch and current 
branch
   
   x-special/nautilus-clipboard
   copy
   file:///home/nihal/Pictures/after1.png
   file:///home/nihal/Pictures/before2.png
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

2021-01-18 Thread GitBox


nihal0107 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-762304031


   @Indhumathi27 please find the comparison with master branch and current 
branch
   
   x-special/nautilus-clipboard
   copy
   file:///home/nihal/Pictures/after1.png
   file:///home/nihal/Pictures/before2.png
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 edited a comment on pull request #4076: [CARBONDATA-4107] Added mvExists property for MV fact table and added lock while touchMDTFile

2021-01-18 Thread GitBox


Indhumathi27 edited a comment on pull request #4076:
URL: https://github.com/apache/carbondata/pull/4076#issuecomment-762229135


   > @Indhumathi27 if the number of mv tables is less than 10 (configurable), 
maybe we can list all tables; if-else we can store database information only.
   
   actually, here we are reading MV schemas. Mv schemas for a fact table can be 
present in any db (join scenarios).. so, we need to know the database also. So, 
we have to store Database_Name.MV_Name in fact table, which can increase fact 
table size. Still, you want to store mv tables in fact table? 
   
   @QiangCai  @akashrn5 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4076: [CARBONDATA-4107] Added mvExists property for MV fact table and added lock while touchMDTFile

2021-01-18 Thread GitBox


Indhumathi27 commented on pull request #4076:
URL: https://github.com/apache/carbondata/pull/4076#issuecomment-762229135


   > @Indhumathi27 if the number of mv tables is less than 10 (configurable), 
maybe we can list all tables; if-else we can store database information only.
   
   actually, here we are reading MV schemas. Mv schemas for a fact table can be 
present in any db (join scenarios).. so, we need to know the database also. So, 
we have to store Database_Name.MV_Name in fact table, which can increase fact 
table size. Still, you want to store mv tables in fact table? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-4101) Carbondata Connectivity via JDBC driver

2021-01-18 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267164#comment-17267164
 ] 

Nihal kumar ojha commented on CARBONDATA-4101:
--

Hi Rohit,

    We can connect carbondata using the JDBC connector.

Please follow [https://carbondata.apache.org/quick-start-guide.html] to 
understand the integration of carbondata with different engines like spark, 
presto, hive, flink, and let us know if you have any dought.

 

Regards,

Nihal kumar ojha

> Carbondata Connectivity via JDBC driver
> ---
>
> Key: CARBONDATA-4101
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4101
> Project: CarbonData
>  Issue Type: Task
>  Components: other
>Reporter: Rohit Paranjape
>Priority: Blocker
>
> Hello Team,
>  
> We are working on one POC in which we wanted to connect to carbondata via our 
> third party application using JDBC connector.
>  
> Can we connect to carbondata using JDBC ? If yes, what would be the procedure 
> to do the same and if not, then what would be the possible options to connect 
> to carbondata using 
> third party application.
>  
> Please share your inputs on the same.
>  
> Thanks & Regards,
> Rohit Paranjape



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4106) Compaction is not working properly

2021-01-18 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat closed CARBONDATA-4106.

Fix Version/s: (was: 2.0.1)
   Resolution: Not A Bug

> Compaction is not working properly
> --
>
> Key: CARBONDATA-4106
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4106
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: Apache spark 2.4.5, carbonData 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Attachments: describe_fact_probe_1
>
>
> Hi Team,
> We are using apache carbondata 2.0.1 for one of our POC and we observed that 
> we are not getting proper benifit from using compaction (Both majour and 
> minor).
> Please find below details for the issue we are facing:
> *Name of the table used*:  fact_365_1_probe_1
> +*Number of rows:*
> +
> select count(*) from fact_365_1_probe_1
>  ++
>  |count(1)|
>  ++
>  |76963753|
> *Sample data from the table:*
> ==
> +---+--++--+-+---+
>  | ts| metric| tags_id| value| epoch| ts2|
>  
> +---+--++--+-+---+
>  |2021-01-07 
> 21:05:00|Probe.Duplicate.Poll.Count|c8dead9b-87ae-46ae-8703-bc2b7bfba5d4|39.611356797970274|1610033757768|2021-01-07
>  00:00:00|
>  |2021-01-07 
> 23:50:00|Probe.Duplicate.Poll.Count|62351ef2-f2ce-49d1-a2fd-a0d1e5f6a1b9| 
> 72.70658115131307|1610043742516|2021-01-07 00:00:00|
>  
> [^describe_fact_probe_1]
>  
> I have attached  the describe output which will show you the other details of 
> the table.
> The size of the table is 3.24 GB and even after running minor or majour 
> compaction the size remain almost the same.
> So we re not getting any benifit by running the compaction.Could you please 
> review the shared details and help us in identifying if we are missing 
> something here or is there any bug?
> Also we need answer to the following questions about carbondata storate:
> 1. In case of decimal values, how the storage behaves like if i have one row 
> with 20 digits after decimal and second row has only 5 digits  after decimal 
> so how and what would be the difference in the storage taken.
> 2. My second question is , if i have two tables and one of the table has same 
> values for 100 rows and other table has different values for 100 rows so how 
> carbon will behave as far as the storage is concerned in this scenario. WHich 
> table will take less storage or both will take same storage.
> 3.Also for string datatype could you please describe what is the storage 
> defined for string datatype.
>  
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4106) Compaction is not working properly

2021-01-18 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267160#comment-17267160
 ] 

Ajantha Bhat commented on CARBONDATA-4106:
--

closing the defect as it is not an issue and the current compaction cannot be 
useful in this corner case. 

> Compaction is not working properly
> --
>
> Key: CARBONDATA-4106
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4106
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: Apache spark 2.4.5, carbonData 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: describe_fact_probe_1
>
>
> Hi Team,
> We are using apache carbondata 2.0.1 for one of our POC and we observed that 
> we are not getting proper benifit from using compaction (Both majour and 
> minor).
> Please find below details for the issue we are facing:
> *Name of the table used*:  fact_365_1_probe_1
> +*Number of rows:*
> +
> select count(*) from fact_365_1_probe_1
>  ++
>  |count(1)|
>  ++
>  |76963753|
> *Sample data from the table:*
> ==
> +---+--++--+-+---+
>  | ts| metric| tags_id| value| epoch| ts2|
>  
> +---+--++--+-+---+
>  |2021-01-07 
> 21:05:00|Probe.Duplicate.Poll.Count|c8dead9b-87ae-46ae-8703-bc2b7bfba5d4|39.611356797970274|1610033757768|2021-01-07
>  00:00:00|
>  |2021-01-07 
> 23:50:00|Probe.Duplicate.Poll.Count|62351ef2-f2ce-49d1-a2fd-a0d1e5f6a1b9| 
> 72.70658115131307|1610043742516|2021-01-07 00:00:00|
>  
> [^describe_fact_probe_1]
>  
> I have attached  the describe output which will show you the other details of 
> the table.
> The size of the table is 3.24 GB and even after running minor or majour 
> compaction the size remain almost the same.
> So we re not getting any benifit by running the compaction.Could you please 
> review the shared details and help us in identifying if we are missing 
> something here or is there any bug?
> Also we need answer to the following questions about carbondata storate:
> 1. In case of decimal values, how the storage behaves like if i have one row 
> with 20 digits after decimal and second row has only 5 digits  after decimal 
> so how and what would be the difference in the storage taken.
> 2. My second question is , if i have two tables and one of the table has same 
> values for 100 rows and other table has different values for 100 rows so how 
> carbon will behave as far as the storage is concerned in this scenario. WHich 
> table will take less storage or both will take same storage.
> 3.Also for string datatype could you please describe what is the storage 
> defined for string datatype.
>  
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] QiangCai commented on pull request #4076: [CARBONDATA-4107] Added mvExists property for MV fact table and added lock while touchMDTFile

2021-01-18 Thread GitBox


QiangCai commented on pull request #4076:
URL: https://github.com/apache/carbondata/pull/4076#issuecomment-762096880


   @Indhumathi27 if the number of mv tables is less than 10 (configurable), 
maybe we can list all tables; if-else we can store database information only.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2021-01-18 Thread GitBox


QiangCai commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-762090072


   how about skip carbonindex files step to generate carbonindexmerge files 
directly on driver side?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org