[GitHub] [carbondata] nihal0107 commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883917592


   As I can see in the output of the show segment => segment status with id 0 
and 1 is marked for delete. It means these segments are not valid. You can 
execute once `clean file command` to remove these unnecessary segments. In the 
delete command, you can give the segment id which status is either success.
   Something similar to `DELETE FROM table  test_table WHERE SEGMENT.ID IN 
(2.3)`
   After executing this query your segment status will be `marked for delete`.
   You can remove all these(marked for delete, compacted) segments with clean 
files.
   Refer to this: 
https://github.com/apache/carbondata/blob/master/docs/clean-files.md
   you can use force option for clean or based on your requirement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


study-day commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883819350


   hi, when the spark beeline  ,it also happen error
   ```
   [hdfs@hadoop-node-1 spark-2.3.4-bin-hadoop2.7]$ bin/beeline
   Beeline version 1.2.1.spark2 by Apache Hive
   beeline> !connecot jdbc:hive2://hadoop-node-1:1
   Unknown command: connecot jdbc:hive2://hadoop-node-1:1
   beeline> !connect jdbc:hive2://hadoop-node-1:1
   Connecting to jdbc:hive2://hadoop-node-1:1
   Enter username for jdbc:hive2://hadoop-node-1:1: hd123
   Enter password for jdbc:hive2://hadoop-node-1:1: **
   Connected to: Spark SQL (version 2.3.4)
   Driver: Hive JDBC (version 1.2.1.spark2)
   Transaction isolation: TRANSACTION_REPEATABLE_READ
   0: jdbc:hive2://hadoop-node-1:1> show segments for table  test_table;
   
+---++--+--+++-+--+--+
   |  ID   |   Status   | Load Start Time  | Load Time Taken  | 
Partition  | Data Size  | Index Size  | File Format  |
   
+---++--+--+++-+--+--+
   | 21| Compacted  | 2021-07-09 09:22:41.538  | 7.399S   | 
NA | 619.53KB   | 54.21KB | columnar_v3  |
   | 20| Compacted  | 2021-07-08 18:15:33.536  | 1.454S   | 
NA | 411.54KB   | 54.02KB | columnar_v3  |
   | 19| Compacted  | 2021-07-08 18:14:44.265  | 8.104S   | 
NA | 259.04KB   | 53.96KB | columnar_v3  |
   | 18| Compacted  | 2021-07-08 18:09:25.752  | 7.792S   | 
NA | 178.86KB   | 53.90KB | columnar_v3  |
   | 17| Compacted  | 2021-07-08 18:09:02.815  | 5.136S   | 
NA | 88.90KB| 26.86KB | columnar_v3  |
   | 16.1  | Compacted  | 2021-07-12 13:51:47.44   | 2.452S   | 
NA | 390.78KB   | 54.30KB | columnar_v3  |
   | 16| Compacted  | 2021-07-08 18:03:54.558  | 7.348S   | 
NA | 44.62KB| 13.42KB | columnar_v3  |
   | 15| Compacted  | 2021-07-08 15:03:17.527  | 1.354S   | 
NA | 12.61KB| 1.29KB  | columnar_v3  |
   | 14| Compacted  | 2021-07-08 14:32:53.337  | 0.485S   | 
NA | 7.48KB | 1.29KB  | columnar_v3  |
   | 13| Compacted  | 2021-07-08 14:32:36.673  | 0.44S| 
NA | 4.83KB | 1.28KB  | columnar_v3  |
   | 12.1  | Compacted  | 2021-07-12 13:51:47.44   | 1.122S   | 
NA | 22.06KB| 1.30KB  | columnar_v3  |
   | 12| Compacted  | 2021-07-08 14:30:41.506  | 0.43S| 
NA | 3.59KB | 1.28KB  | columnar_v3  |
   | 11| Compacted  | 2021-07-08 14:29:57.866  | 0.436S   | 
NA | 2.95KB | 1.27KB  | columnar_v3  |
   | 10| Compacted  | 2021-07-08 14:29:45.201  | 0.445S   | 
NA | 2.57KB | 1.27KB  | columnar_v3  |
   | 9 | Compacted  | 2021-07-08 14:28:36.513  | 0.438S   | 
NA | 2.38KB | 1.27KB  | columnar_v3  |
   | 8.1   | Compacted  | 2021-07-12 13:51:47.44   | 0.837S   | 
NA | 3.52KB | 1.28KB  | columnar_v3  |
   | 8 | Compacted  | 2021-07-08 14:27:50.502  | 0.541S   | 
NA | 2.28KB | 1.26KB  | columnar_v3  |
   | 7 | Compacted  | 2021-07-08 14:27:08.431  | 0.49S| 
NA | 2.20KB | 1.26KB  | columnar_v3  |
   | 6 | Marked for Delete  | 2021-07-08 10:48:47.684  | 0.386S   | 
NA | 1.08KB | 656.0B  | columnar_v3  |
   | 5 | Compacted  | 2021-07-08 10:44:38.283  | 14.552S  | 
NA | 1.06KB | 646.0B  | columnar_v3  |
   | 4 | Compacted  | 2021-07-08 10:43:51.58   | 14.259S  | 
NA | 1.05KB | 644.0B  | columnar_v3  |
   | 3 | Marked for Delete  | 2021-07-08 10:43:19.104  | 16.868S  | 
NA | 1.05KB | 644.0B  | columnar_v3  |
   | 2.3   | Success| 2021-07-12 13:52:15.043  | 1.342S   | 
NA | 1.14MB | 54.60KB | columnar_v3  |
   | 2.2   | Compacted  | 2021-07-12 13:51:47.44   | 1.389S   | 
NA | 23.36KB| 1.30KB  | columnar_v3  |
   | 2.1   | Compacted  | 2021-07-12 13:51:47.44   | 0.56S| 
NA | 2.28KB | 1.27KB  | columnar_v3  |
   | 2 | Compacted  | 2021-07-08 10:27:01.657  | 0.487S   | 
NA | 1.14KB | 659.0B  | columnar_v3  |
   | 1 | Marked for Delete  | 2021-07-08 10:21:01.823  | 0.45S| 
NA | 1.06KB | 646.0B  | columnar_v3  |
   | 0 | Marked for Delete  | 2021-07-08 10:20:36.083 

[GitHub] [carbondata] study-day commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


study-day commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883815940


   1. it is hive beeline 
   ```
   0: jdbc:hive2://hadoop-node-1:1> show create table  test_table;
   
++--+
   |   createtab_stmt   
|
   
++--+
   | CREATE TABLE `test_table` (`id` STRING, `name` STRING, `city` STRING, 
`age` INT)
   USING carbondata
   OPTIONS (
 `indexInfo` '[]'
   )
 |
   
++--+
   1 row selected (0.493 seconds)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 edited a comment on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 edited a comment on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883222462


   Can you please share the details of where you are running these queries? 
Either it is hive-beeline or spark sql/beeline, etc. As these queries should 
not fail. Because in the case of spark we have many test cases where we run 
this query. Ideally, it should not be an issue. Also, please share the create 
table command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day commented on issue #4172: tez will report an error

2021-07-20 Thread GitBox


study-day commented on issue #4172:
URL: https://github.com/apache/carbondata/issues/4172#issuecomment-883224506


   hi ,thank you for your suggestion。
   you can try it in the hive  client  (tez engine)  the error will happen . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883222462


   Can you please share the details of where you are running these queries? 
Either it is hive-beeline or spark sql/beeline, etc. As these queries should 
not fail. Because in the case of spark we have many test cases where we run 
this query. Ideally, it should not be an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


study-day commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-882976676


   Hi, DELETE FROM default.test_table WHERE SEGMENT.ID IN (0,1);  also  
reported an error.
   error info :
   Error: org.apache.spark.sql.AnalysisException: cannot resolve '`SEGMENT.ID`' 
given input columns: .line 1 pos 45;
   'Project ['tupleId]
   +- 'Filter 'SEGMENT.ID IN (0) ... 39 more fields]
 +- SubqueryAlias    38 more fields] CarbonDatasourceHadoopRelation 
(state=,code=0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-4207) MV data getting lost

2021-07-20 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384056#comment-17384056
 ] 

Indhumathi commented on CARBONDATA-4207:


Hi Suyash,

Can you provide the create MV sql to replicate the issue.

For FULL REFRESH case, when loading ( (INSERT-OVERWRITE)) is in progress to MV 
table, and load failed due to any system / application crash/failure, in that 
case, MV will not have any data and it will be disabled. Have to sync the data 
again using Refresh MV command to enable it.

Let me know, what is the reason for insertion failure also.

> MV data getting lost
> 
>
> Key: CARBONDATA-4207
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4207
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
> We have observed one more issue, We had created one table and a timeseries MV 
> on it. We had loaded almost 15 hours of data into it and then when we were 
> loading 16th hour data the loading failed because of some reason but it 
> caused MV to go empty. Our mv has now zero rows. Could you please let us know 
> if there is any bug or this is how it is supposed to work. Because our MV did 
> not have any avg function so ideally the loading to MV should have been 
> incremental , and in that case MV should not have got impacted if the 
> subsequent hour loading to main table failed. Please have a look into this 
> issue. And let us know what information you need.
>  
> scala> spark.sql("insert into Flow_TS_2day_stats_04062021 select 
> start_time,end_time,source_ip_address,destintion_ip_address,appname,protocol_id,source_tos,src_as,dst_as,source_mask,destination_mask,dst_tos,input_pkt,output_pkt,input_byt,output_byt,source_port,destination_port,in_interface,out_interface
>  from Flow_TS_1day_stats_24052021  where start_time>='2021-03-04 07:00:00' 
> and start_time< '2021-03-04 09:00:00'").show()
>  
> [1:38|https://carbondataworkspace.slack.com/archives/D01GLHKSAFL/p1623226096008700]
> scala> spark.sql("insert into Flow_TS_2day_stats_04062021 select 
> start_time,end_time,source_ip_address,destintion_ip_address,appname,protocol_id,source_tos,src_as,dst_as,source_mask,destination_mask,dst_tos,input_pkt,output_pkt,input_byt,output_byt,source_port,destination_port,in_interface,out_interface
>  from Flow_TS_1day_stats_24052021  where start_time>='2021-03-04 15:00:00' 
> and start_time< '2021-03-04 16:00:00'").show()
> 21/06/06 14:25:33 AUDIT audit: \{"time":"June 6, 2021 2:25:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4069819623887063","opStatus":"START"}
> 21/06/06 14:44:14 AUDIT audit: \{"time":"June 6, 2021 2:44:14 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4070940294400824","opStatus":"START"}
> 21/06/06 16:06:05 AUDIT audit: \{"time":"June 6, 2021 4:06:05 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4070940294400824","opStatus":"SUCCESS","opTime":"4911240 
> ms","table":"default.Interface_Level_Agg_10min_MV_04062021","extraInfo":{"SegmentId":"6","DataSize":"4.52GB","IndexSize":"108.27KB"}}
> 21/06/06 16:06:09 AUDIT audit: \{"time":"June 6, 2021 4:06:09 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4069819623887063","opStatus":"SUCCESS","opTime":"6036073 
> ms","table":"default.flow_ts_2day_stats_04062021","extraInfo":{"SegmentId":"6","DataSize":"12.37GB","IndexSize":"262.43KB"}}[^Stack_Trace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] study-day commented on issue #4172: tez will report an error

2021-07-20 Thread GitBox


study-day commented on issue #4172:
URL: https://github.com/apache/carbondata/issues/4172#issuecomment-883224506


   hi ,thank you for your suggestion。
   you can try it in the hive  client  (tez engine)  the error will happen . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 edited a comment on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 edited a comment on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-882401423






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on issue #4172: tez will report an error

2021-07-20 Thread GitBox


nihal0107 commented on issue #4172:
URL: https://github.com/apache/carbondata/issues/4172#issuecomment-882402060


   If you are not sure about the issue then can you please close it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-882401423






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


study-day commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-882976676


   Hi, DELETE FROM default.test_table WHERE SEGMENT.ID IN (0,1);  also  
reported an error.
   error info :
   Error: org.apache.spark.sql.AnalysisException: cannot resolve '`SEGMENT.ID`' 
given input columns: .line 1 pos 45;
   'Project ['tupleId]
   +- 'Filter 'SEGMENT.ID IN (0) ... 39 more fields]
 +- SubqueryAlias    38 more fields] CarbonDatasourceHadoopRelation 
(state=,code=0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day commented on issue #4172: tez will report an error

2021-07-20 Thread GitBox


study-day commented on issue #4172:
URL: https://github.com/apache/carbondata/issues/4172#issuecomment-883224506


   hi ,thank you for your suggestion。
   you can try it in the hive  client  (tez engine)  the error will happen . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] study-day opened a new issue #4178: how to use MERGE INTO

2021-07-20 Thread GitBox


study-day opened a new issue #4178:
URL: https://github.com/apache/carbondata/issues/4178


   Support MERGE INTO SQL Syntax
   CarbonData now supports MERGE INTO SQL syntax along with the API support. 
This will help the users to write CDC job and merge job using SQL also now.
   
   how to use MERGE INTO ? 
   Please add in the use document


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 edited a comment on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 edited a comment on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883222462


   Can you please share the details of where you are running these queries? 
Either it is hive-beeline or spark sql/beeline, etc. As these queries should 
not fail. Because in the case of spark we have many test cases where we run 
this query. Ideally, it should not be an issue. Also, please share the create 
table command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on issue #4173: DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

2021-07-20 Thread GitBox


nihal0107 commented on issue #4173:
URL: https://github.com/apache/carbondata/issues/4173#issuecomment-883222462


   Can you please share the details of where you are running these queries? 
Either it is hive-beeline or spark sql/beeline, etc. As these queries should 
not fail. Because in the case of spark we have many test cases where we run 
this query. Ideally, it should not be an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-4165) Carbondata summing up two values of same timestamp.

2021-07-20 Thread SHREELEKHYA GAMPA (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383867#comment-17383867
 ] 

SHREELEKHYA GAMPA commented on CARBONDATA-4165:
---

Hi Suyash,

Can you please share more details of the problem with some example queries.

 

> Carbondata summing up two values of same timestamp.
> ---
>
> Key: CARBONDATA-4165
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4165
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, apache spark 2.4.5 hadoop 2.7.2
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
>  
> We have seen a behaviour while using Carbondata 2.0.1 that if we get 2 values 
> for same timestamp then it tries to sum both the values and put it as one 
> value. Instead we need that it should discard previous  value and use the 
> latest one.
>  
> Please let us know if there is any functionality already available in 
> carbondata to handle duplicate values by it self or if there is any plan to 
> implement such a functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)