Hi Shi Wei,

(looping back to user@nutch - sorry, should have replied to the list)

First, the masking of sensitive strings is tracked in
   https://issues.apache.org/jira/browse/NUTCH-2905

Second, to disable the logging:

The logging class is IndexerOutputFormat, so you need to add

  log4j.logger.org.apache.nutch.indexer.IndexerOutputFormat=WARN

or for Nutch 1.19 and the current master edit the file conf/log4j2.xml
and add to the list of <Loggers>:

    <Logger name="org.apache.nutch.indexer.IndexerOutputFormat"
            level="WARN" additivity="false">
      <Appender-ref ref="RollingFile" level="WARN" />
    </Logger>

Best,
Sebastian

On 11/17/21 14:07, sw.l...@quandatics.com wrote:
> Hi Sebastian,
>  
> Thanks for your reply.
>  
> According to the statement, "You could set the log level for the class
> logging the password from INFO to WARN.", may we know which
> class/parameter that we should set to only restrict the Elasticsearch
> indexer logs to WARN level? This is because we have tried to set the
> following in the log4j.properties but it doesn't help.
>  
> log4j.logger.org.apache.nutch.indexwriter.elastic.ElasticIndexWriter=WARN,cmdstdout
> log4j.logger.org.apache.nutch.indexwriter.elastic.ElasticUtils=WARN,cmdstdout
>  
>  
> Best Regards,
> Shi Wei
>  
> On 2021-11-15 21:26, Sebastian Nagel wrote:
>> Hi Shi Wei,
>>
>>> hide the password value in hadoop.log file table ?
>>
>> You could set the log level for the class logging the password from INFO
>> to WARN. Then the index writer configuration isn't logged anymore.
>> As said, this is a work-around not a final solution which should,
>> of course, mask passwords when logging.
>>
>>> We also ran into an issue where an https connection could not be
>>> established with elasticsearch
>>
>> If the problem persists could you start a separate thread?
>>
>> Thanks,
>> Sebastian
>>
>> On 11/12/21 10:57, sw.l...@quandatics.com
>> <mailto:sw.l...@quandatics.com> wrote:
>>> Hi, Sebastian
>>>
>>> Thanks for your suggestion, may I know if there is a way to hide the
>>> password value in hadoop.log file table ?
>>> We also ran into an issue where an https connection could not be
>>> established with elasticsearch. Do you have any suggestions to solve
>>> this problem?
>>> Thank
>>>
>>>
>>>
>>> Best Regards,
>>>  Shi Wei
>>>
>>> -----Original Message-----
>>> From: Sebastian Nagel <wastl.na...@googlemail.com.INVALID
>>> <mailto:wastl.na...@googlemail.com.INVALID>>
>>> Sent: Friday, 12 November, 2021 1:20 AM
>>> To: user@nutch.apache.org <mailto:user@nutch.apache.org>
>>> Subject: Re: encrypt password of the index-writer.xml
>>>
>>> Hi Shi Wei,
>>>
>>> there is a way, although definitely not the recommended one.
>>> Sorry, and it took me a little bit to proof it.
>>>
>>> Do you know about external XML entities or XXE attacks?
>>>
>>> 1. On top of the index-writers.xml you add an entity declaration:
>>>
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <!DOCTYPE urlset [
>>>   <!ENTITY CREDENTIALS SYSTEM "file:///path/to/credentials.txt">
>>> ]>
>>>
>>>
>>> 2. it's used later in the index writer spec:
>>>
>>>   <writer id="indexer_solr_1"
>>>           class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>>>     <parameters>
>>>       ...
>>>       &CREDENTIALS;
>>>     </parameters>
>>>
>>> 3. you add your credentials snippet to the file /path/to/credentials.txt
>>>
>>> <param name="username" value="username"/> <param name="password"
>>> value="SECRET"/>
>>>
>>> 4. and voila:
>>>
>>> $> bin/nutch index crawldb segment
>>> ...
>>> ├────────────┼─────────────────────────────┼─────────┤
>>> │username    │The username of Solr server. │username │
>>> ├────────────┼─────────────────────────────┼─────────┤
>>> │password    │The password of Solr server. │SECRET   │
>>> └────────────┴─────────────────────────────┴─────────┘
>>>
>>>
>>> Note: this is an dirty hack but not a security issue: with access to
>>> the index-writers.xml you can write anything into it.  But there is
>>> no guarantee that this hack will continue to work in the future.
>>>
>>> Would you please be so kind to open a Jira issue to add real support
>>> for passwords in the index-writers.xml
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>>
>>> On 11/10/21 11:16, sw.l...@quandatics.com
>>> <mailto:sw.l...@quandatics.com> wrote:
>>>> Hi ,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> We have tried the variable expansion method on the index-writers.xml,
>>>> it doesn't work. Could you advise if there are any alternative ways to
>>>> encrypt the password in the index-writers.xml file?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Shi Wei
>>>>
>>>>
>>>>
>>>>

Reply via email to