[ 
https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509090
 ] 

Doğacan Güney edited comment on NUTCH-506 at 6/29/07 5:50 AM:
--------------------------------------------------------------

This patch changes Content (Content is no longer a CompressedWritable) and 
ParseText (from VersionedWritable(*) to Writable). These changes are backwards 
compatible. So old segments can still be read after this patch.

Patch also changes Content's public api very slightly. Content.forceInflate 
method is removed because it is no longer needed.

(*) I don't understand how VersionedWritable works. AFAICS, there is no easy 
way to get what version you just read, so it is useless for data versioning.


 was:
This patch changes Content (Content is no longer a CompressedWritable) and 
ParseText (from VersionedWritable(*) to Writable). These changes are backwards 
compatible. So old segments can still be read after this patch.

Patch also changes Content's public api very slightly. Content.forceInflate 
method is removed because it is no longer needed.

> Nutch should delegate compression to Hadoop
> -------------------------------------------
>
>                 Key: NUTCH-506
>                 URL: https://issues.apache.org/jira/browse/NUTCH-506
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: compress.patch
>
>
> Some data structures within nutch (such as Content, ParseText) handle their 
> own compression. We should delegate all compressions to Hadoop. 
> Also, nutch should respect io.seqfile.compression.type setting. Currently 
> even if io.seqfile.compression.type is BLOCK or RECORD, nutch overrides it 
> for some structures and sets it to NONE (However, IMO, ParseText should 
> always be compressed as RECORD because of performance reasons).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to