[jira] [Comment Edited] (OAK-2498) Root record references provide too little context for parsing a segment

2016-10-17 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582248#comment-15582248
 ] 

Francesco Mari edited comment on OAK-2498 at 10/17/16 1:22 PM:
---

In an offline conversation with [~mduerig] and [~alex.parvulescu] we figured 
out that we need to add the following types to the ones currently recognized by 
the system.

- Binary reference. This has to be used when parsing the segment to detect 
binary reference records. Detecting binary references is necessary when 
maintaining the index of external binary references in the TAR writer up to 
date.
- List of property names. A list of strings, where every string is a property 
name, is referenced by the template record.
- List of list of values. This list is pointed to by the node record and 
contains the values for single\- and multi\- value properties of that node. The 
double indirection is needed to support multi-value properties.
- Map from string to node. This map is referenced by the template and 
represents the child relationship between nodes.
- Super root. This is a marker type identifying top-level records for the 
repository super-roots.

I will go ahead with the implementation and keep this issue up to date. As 
[~mduerig] suggested, the implementation should probably include a segment 
parser to validate the correctness of the serialization format.


was (Author: frm):
In an offline conversation with [~mduerig] and [~alex.parvulescu] we figured 
out that we need to add the following types to the ones currently recognized by 
the system.
- Binary reference. This has to be used when parsing the segment to detect 
binary reference records. Detecting binary references is necessary when 
maintaining the index of external binary references in the TAR writer up to 
date.
- List of property names. A list of strings, where every string is a property 
name, is referenced by the template record.
- List of list of values. This list is pointed to by the node record and 
contains the values for single\- and multi\- value properties of that node. The 
double indirection is needed to support multi-value properties.
- Map from string to node. This map is referenced by the template and 
represents the child relationship between nodes.
- Super root. This is a marker type identifying top-level records for the 
repository super-roots.
I will go ahead with the implementation and keep this issue up to date. As 
[~mduerig] suggested, the implementation should probably include a segment 
parser to validate the correctness of the serialization format.

> Root record references provide too little context for parsing a segment
> ---
>
> Key: OAK-2498
> URL: https://issues.apache.org/jira/browse/OAK-2498
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Francesco Mari
>  Labels: tools
> Fix For: Segment Tar 0.0.16
>
>
> According to the [documentation | 
> http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html] the root 
> record references in a segment header provide enough context for parsing all 
> records within this segment without any external information. 
> Turns out this is not true: if a root record reference turns e.g. to a list 
> record. The items in that list are record ids of unknown type. So even though 
> those records might live in the same segment, we can't parse them as we don't 
> know their type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-2498) Root record references provide too little context for parsing a segment

2016-10-04 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544848#comment-15544848
 ] 

Alex Parvulescu edited comment on OAK-2498 at 10/4/16 12:07 PM:


results:
{noformat}
Total size:
20 GB in  82284 data segments
768 KB in  3 bulk segments
4 GB in maps (46450859 leaf and branch records)
1 GB in lists (55469092 list and bucket records)
3 GB in values (value and block records of 70765667 properties, 
3429/378684/0/1214419 small/medium/long/external blobs, 46258452/1862224/159 
small/medium/long strings)
194 MB in templates (16772712 template records)
3 GB in nodes (251591739 node records)
{noformat}


was (Author: alex.parvulescu):
sure, I'll post the results here

> Root record references provide too little context for parsing a segment
> ---
>
> Key: OAK-2498
> URL: https://issues.apache.org/jira/browse/OAK-2498
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Francesco Mari
>  Labels: tools
> Fix For: Segment Tar 0.0.14
>
>
> According to the [documentation | 
> http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html] the root 
> record references in a segment header provide enough context for parsing all 
> records within this segment without any external information. 
> Turns out this is not true: if a root record reference turns e.g. to a list 
> record. The items in that list are record ids of unknown type. So even though 
> those records might live in the same segment, we can't parse them as we don't 
> know their type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)