Here is the scope of my project. We want to collect content from a document management system (Nuxeo), an intranet (Drupal) and files from file system (shared drives) in oprder to be retrivable by means of a search engine. All of of these sources are internal information for internal audience, this is about unstructured content (documents and web pages) We want to use Nutch as the crawler on these sources. Then Tika would extract and format the data and commit to Elasticsearch (or SolR).
1) Is Nutch an appropriate solution to collect documents and their metadatas from file system (shared drives) ? 2) Is Nutch has the ability to collect the permissions that are set on the NTFS Security tab of the directory tree or on the file ?

