[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/3047


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-03 Thread manishnalla1994
Github user manishnalla1994 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r245003004
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, true)
--- End diff --

As it is a metadata function, we are just computing it once and saving it 
while passing TRUE in 'calculateDataIndexSize' this function. So the value 
computed can be used afterwards also.


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-03 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244980360
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, true)
--- End diff --

Show segments is a read only query. I think we should not perform write 
operation in a query. 
So, I feel its better to calculate every time and show OR just display as 
not available.


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-03 Thread manishnalla1994
Github user manishnalla1994 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244957746
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, false)
--- End diff --

Fixed.



---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-03 Thread manishnalla1994
Github user manishnalla1994 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244957693
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -46,9 +47,9 @@ object CarbonStore {
 
   def showSegments(
   limit: Option[String],
-  tablePath: String,
+  carbonTable: CarbonTable,
--- End diff --

Done.


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244920921
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -46,9 +47,9 @@ object CarbonStore {
 
   def showSegments(
   limit: Option[String],
-  tablePath: String,
+  carbonTable: CarbonTable,
--- End diff --

Move `carbonTable` as the first argument of method


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244922117
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, false)
--- End diff --

Boolean flag in the method call is to update the data and index size in the 
table status file. Pass the flag as true so that it computes the size and 
update the table status file. This will avoid calculation for each Show Segment 
call


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishnalla1994
Github user manishnalla1994 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244911752
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,21 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize && null == load.getIndexSize) {
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, false)
+  
(dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
+
dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
+} else {
+  (load.getDataSize.toLong,
--- End diff --

Yes, fixed it now.


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread qiuchenjian
Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244895354
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,21 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize && null == load.getIndexSize) {
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, false)
+  
(dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
+
dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
+} else {
+  (load.getDataSize.toLong,
--- End diff --

if one of load.getDataSize and load.getIndexSize is null, it will throw 
exception, i think this scene should be considered


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishnalla1994
GitHub user manishnalla1994 opened a pull request:

https://github.com/apache/carbondata/pull/3047

[CARBONDATA-3223] Fixed Wrong Datasize and Indexsize calculation for old 
store using Show Segments

Problem: Table Created and Loading on older version(1.1) was showing 
data-size and index-size 0B when refreshed on new version. This was because 
when the data-size was coming as "null" we were not computing it, directly 
assigning 0 value to it.

Solution: Computed the correct data-size and index-size using CarbonTable.
  
Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishnalla1994/carbondata Datasize0Issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3047


commit 6bf65d7a0b42e8d9a822fd234a510550bd8d2f17
Author: manishnalla1994 
Date:   2019-01-02T12:30:36Z

Fixed Wrong Datasize and Indexsize calculation for old store




---