Here is the scope of my project.

We want to collect content from a document management system (Nuxeo), an
intranet (Drupal) and files from file system (shared drives) in oprder to
be retrivable by means of a search engine. All of of these sources are
internal information for internal audience, this is about unstructured
content (documents and web pages) We want to use Nutch as the crawler on
these sources. Then Tika would extract and format the data and commit to
Elasticsearch (or SolR).

1) Is Nutch an appropriate solution to collect documents and their
metadatas from file system (shared drives) ?

2) Is Nutch has the ability to collect the permissions that are set on the
NTFS Security tab of the directory tree or on the file ?

Reply via email to