[jira] [Commented] (KYLIN-2386) Revert KYLIN-2349 and KYLIN-2353

2017-01-13 Thread Daniel Lemire (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822173#comment-15822173
 ] 

Daniel Lemire commented on KYLIN-2386:
--

We would be willing to expose static functions that compute the cardinality and 
the serialized size directly within the RoaringBitmap library without needing 
any memory allocation whatsoever, just data access.

The code is already there, it is just private: 

https://github.com/RoaringBitmap/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/buffer/ImmutableRoaringArray.java#L131-L147

If you are interested, just ping me with the desired function signatures.

> Revert KYLIN-2349 and KYLIN-2353
> 
>
> Key: KYLIN-2386
> URL: https://issues.apache.org/jira/browse/KYLIN-2386
> Project: Kylin
>  Issue Type: Task
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
> Fix For: v2.0.0
>
>
> In KYLIN-2349 and KYLIN-2353, we changed the storage format of BitmapCounter 
> for better performance. In the new format, cardinality and serialized size 
> are recorded in the header part. This enables us to retrieve those 
> information without deserialize the whole data.
> However, cardinality and serialized size can be quickly calculated just from 
> the header of [roaring 
> format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance 
> tests show that we could achieve the same performance gain without the format 
> change. The benefits are
> * there is no need for user to rebuild existing cube to get better performance
> * there is no need for developer to maintain two formats and deal with 
> compatibility issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-13 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822638#comment-15822638
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

Scan fact table twice is costly which we should avoid; I think the dictionaries 
can be merged (in job node) after building in reducers; The memory footprint of 
merge is much smaller than building, so it is acceptable for job node; will 
this be better?

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Summary: A new BitmapCounter with better performance  (was: Improve 
BitmapCounter performance by avoiding expensive deserialize)

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency 
* 

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency 
> * 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2386) Revert KYLIN-2349 and KYLIN-2353

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2386:
-
Description: 
In KYLIN-2349 and KYLIN-2353, we optimized performance of BitmapCounter by 
changing its storage format. Cardinality and serialized size are recorded in 
the header of the new format, enables us to retrieve those information without 
deserialize the data.

In fact, cardinality and serialized size can be quickly calculated just from 
the header of [roaring 
format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance tests 
show that we could achieve the same performance boost without the format change 
of BitmapCounter. The benefits are
* there is no need for user to rebuild existing cube to get better performance
* there is no need for developer to maintain two formats and deal with 
compatibility issues

  was:
In KYLIN-2349 and KYLIN-2353, we optimized performance of BitmapCounter by 
changing its storage format. Cardinality and serialized size are recorded in 
the header of the new format, enables us to retrieve those information without 
deserialize the data.

In fact, cardinality and serialized size can be quickly calculated just from 
the header of [roaring 
format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance tests 
show that we could achieve the same performance boost without the format change 
of BitmapCounter. The benefits are
* no need to rebuild existing cube to get better performance
* no need to maintain two formats and deal with compatibility issues


> Revert KYLIN-2349 and KYLIN-2353
> 
>
> Key: KYLIN-2386
> URL: https://issues.apache.org/jira/browse/KYLIN-2386
> Project: Kylin
>  Issue Type: Task
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> In KYLIN-2349 and KYLIN-2353, we optimized performance of BitmapCounter by 
> changing its storage format. Cardinality and serialized size are recorded in 
> the header of the new format, enables us to retrieve those information 
> without deserialize the data.
> In fact, cardinality and serialized size can be quickly calculated just from 
> the header of [roaring 
> format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance 
> tests show that we could achieve the same performance boost without the 
> format change of BitmapCounter. The benefits are
> * there is no need for user to rebuild existing cube to get better performance
> * there is no need for developer to maintain two formats and deal with 
> compatibility issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2386) Revert KYLIN-2349 and KYLIN-2353

2017-01-13 Thread Dayue Gao (JIRA)
Dayue Gao created KYLIN-2386:


 Summary: Revert KYLIN-2349 and KYLIN-2353
 Key: KYLIN-2386
 URL: https://issues.apache.org/jira/browse/KYLIN-2386
 Project: Kylin
  Issue Type: Task
  Components: Metadata
Affects Versions: v2.0.0
Reporter: Dayue Gao
Assignee: Dayue Gao


In KYLIN-2349 and KYLIN-2353, we optimized performance of BitmapCounter by 
changing its storage format. Cardinality and serialized size are recorded in 
the header of the new format, enables us to retrieve those information without 
deserialize the data.

In fact, cardinality and serialized size can be quickly calculated just from 
the header of [roaring 
format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance tests 
show that we could achieve the same performance boost without the format change 
of BitmapCounter. The benefits are
* no need to rebuild existing cube to get better performance
* no need to maintain two formats and deal with compatibility issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2387) Improve BitmapCounter performance by avoiding expensive deserialize

2017-01-13 Thread Dayue Gao (JIRA)
Dayue Gao created KYLIN-2387:


 Summary: Improve BitmapCounter performance by avoiding expensive 
deserialize
 Key: KYLIN-2387
 URL: https://issues.apache.org/jira/browse/KYLIN-2387
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata, Query Engine, Storage - HBase
Affects Versions: v2.0.0
Reporter: Dayue Gao
Assignee: Dayue Gao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2388) Hot load kylin config from web

2017-01-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2388:
-

 Summary: Hot load kylin config from web
 Key: KYLIN-2388
 URL: https://issues.apache.org/jira/browse/KYLIN-2388
 Project: Kylin
  Issue Type: New Feature
  Components: Web 
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.0.0


Allow admin user reload kylin config from web, which could improve operational 
efficiency and service stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2386) Revert KYLIN-2349 and KYLIN-2353

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao resolved KYLIN-2386.
--
   Resolution: Fixed
Fix Version/s: v2.0.0

commit 
https://github.com/apache/kylin/commit/4b977215186281908a8c29741128242146a2b934

> Revert KYLIN-2349 and KYLIN-2353
> 
>
> Key: KYLIN-2386
> URL: https://issues.apache.org/jira/browse/KYLIN-2386
> Project: Kylin
>  Issue Type: Task
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
> Fix For: v2.0.0
>
>
> In KYLIN-2349 and KYLIN-2353, we changed the storage format of BitmapCounter 
> for better performance. In the new format, cardinality and serialized size 
> are recorded in the header part. This enables us to retrieve those 
> information without deserialize the whole data.
> However, cardinality and serialized size can be quickly calculated just from 
> the header of [roaring 
> format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance 
> tests show that we could achieve the same performance gain without the format 
> change. The benefits are
> * there is no need for user to rebuild existing cube to get better performance
> * there is no need for developer to maintain two formats and deal with 
> compatibility issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2386) Revert KYLIN-2349 and KYLIN-2353

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2386:
-
Description: 
In KYLIN-2349 and KYLIN-2353, we changed the storage format of BitmapCounter 
for better performance. In the new format, cardinality and serialized size are 
recorded in the header part. This enables us to retrieve those information 
without deserialize the whole data.

However, cardinality and serialized size can be quickly calculated just from 
the header of [roaring 
format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance tests 
show that we could achieve the same performance gain without the format change. 
The benefits are
* there is no need for user to rebuild existing cube to get better performance
* there is no need for developer to maintain two formats and deal with 
compatibility issues

  was:
In KYLIN-2349 and KYLIN-2353, we optimized performance of BitmapCounter by 
changing its storage format. Cardinality and serialized size are recorded in 
the header of the new format, enables us to retrieve those information without 
deserialize the data.

In fact, cardinality and serialized size can be quickly calculated just from 
the header of [roaring 
format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance tests 
show that we could achieve the same performance boost without the format change 
of BitmapCounter. The benefits are
* there is no need for user to rebuild existing cube to get better performance
* there is no need for developer to maintain two formats and deal with 
compatibility issues


> Revert KYLIN-2349 and KYLIN-2353
> 
>
> Key: KYLIN-2386
> URL: https://issues.apache.org/jira/browse/KYLIN-2386
> Project: Kylin
>  Issue Type: Task
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> In KYLIN-2349 and KYLIN-2353, we changed the storage format of BitmapCounter 
> for better performance. In the new format, cardinality and serialized size 
> are recorded in the header part. This enables us to retrieve those 
> information without deserialize the whole data.
> However, cardinality and serialized size can be quickly calculated just from 
> the header of [roaring 
> format|https://github.com/RoaringBitmap/RoaringFormatSpec/]. Performance 
> tests show that we could achieve the same performance gain without the format 
> change. The benefits are
> * there is no need for user to rebuild existing cube to get better performance
> * there is no need for developer to maintain two formats and deal with 
> compatibility issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-13 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821788#comment-15821788
 ] 

XIE FAN commented on KYLIN-2217:


Leaving UHC dictionary building job for the job engine to build is ok, but it 
may cause a single-point bottlenect. Actually, KYLIN-2217 is designed to remove 
this bottlenect. If we want to take advantage of both KYLIN-2217 and 
KYLIN-2135, there is another way: we can scan the Fact table twice and in the 
first scan we can know the distribution of data in UHC columns. So in the 
second scan we can split values to multi reducer and ensure  the order between 
reducers base on the result of the first scan. By using this way, the conflict 
can be fixed. But it may need to modify a lot.

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous 
memory allocations
* poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* extra deserialize cost: even if only cardinality info is needed to answer the 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter when necessary
* peekLength is implemented using ImmutableRoaringBitmap, which is very fast 
since only header of roaring bitmap is examined
* It directly serializes to ByteBuffer, no intermediate buffer is allocated

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency 
* 


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous 
> memory allocations
> * poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * extra deserialize cost: even if only cardinality info is needed to answer 
> the query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
> convert it to MutableBitmapCounter when necessary
> * peekLength is implemented using ImmutableRoaringBitmap, which is very fast 
> since only header of roaring bitmap is examined
> * It directly serializes to ByteBuffer, no intermediate buffer is allocated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is the same as before, see 
[RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
Therefore no cube rebuild is needed

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is unchanged


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
> convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before, see 
> [RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is unchanged

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
> convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is unchanged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous 
memory allocations
* poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* extra deserialize cost: even if only cardinality info is needed to answer the 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter when necessary
* peekLength is implemented using ImmutableRoaringBitmap, which is very fast 
since only header of roaring bitmap is examined
* It directly serializes to ByteBuffer, no intermediate buffer is allocated


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
> convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
copied buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is the same as before 
([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
Therefore no cube rebuild is needed

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
copied buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is the same as before, see 
[RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
Therefore no cube rebuild is needed


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-13 Thread Dayue Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-
Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
copied buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is the same as before, see 
[RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
Therefore no cube rebuild is needed

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes, 
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is the same as before, see 
[RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
Therefore no cube rebuild is needed


> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before, see 
> [RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]. 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)