[
https://issues.apache.org/jira/browse/SOLR-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735926#action_12735926
]
Noble Paul commented on SOLR-1313:
----------------------------------
reading from gzipped files is good .but unless we have a corresponding means to
walk through the file list in the zipped file it is not so useful. Most of the
zip files will have multiple files.
I should be able to do as follows
{code:xml}
<dataSource name="zip" type="CompressedFileDataSource" format="gzip"/>
<document>
<entity name="f" processor="CompressedFileListEntityProcessor"
file="/some/path/to/files.gz" fileName=".*xml"
recursive="true" rootEntity="false">
<entity name="x" dataSource="zip"
processor="XPathEntityProcessor"
forEach="/the/record/xpath"
url="${f.fileAbsolutePath}">
<field column="full_name" xpath="/field/xpath"/>
</entity>
</entity>
</document>
{code}
I guess the URLDataSource should be able to accept urls of the format
"jar:file:/home/duke/duke.jar!/a.xml"
> DIH should be able to read gziped files
> ---------------------------------------
>
> Key: SOLR-1313
> URL: https://issues.apache.org/jira/browse/SOLR-1313
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.5
> Reporter: Yousef Ourabi
> Attachments: GzipFileDataSource.java
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> For very large (file) imports it would be beneficial to be able to read from
> gzipped files which should also improve performance (less disk I/O)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.