Hi Shigeki,

The issue is that the "file part" must match whatever is left after
the start point part of the path matches.  So, the file part will
always begin with a "/".

There are two things we could do: (1) Document it, or (2) change it
(by removing the starting "/").  But remember that if there is a path
before the filename it will also look funny, e.g.:

/my/path/and/file.txt

would become

my/path/and/file.txt

Furthermore, if we change the behavior, maybe some peoples' jobs don't
work right anymore.

I will open a ticket to track discussion of this issue.  CONNECTORS-526.

Thanks,
Karl

On Tue, Sep 11, 2012 at 10:11 PM, Shigeki Kobayashi
<[email protected]> wrote:
> I found the reason that  MCF job does not recognize the file name to exclude
> from crawling.
> You need to put a slash character follwoing by a file name.
>
> I obtained a log below. This time I had a root directory,
> //xxxxx/SharePrjG2/xxxxx/sug/, then placed a file named as "phs.txt".
> In the job setting, I entered "phs.txt" to exclude the file from crawling,
> so the crawling rule became as follwoing:
>
>   1. Exclude file(s) matching phs.txt
>
> DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Matching
> startpoint 'smb://xxxxx/SharePrjG2/xxxxx/sug/' against actual
> 'smb://xxxxx/SharePrjG2/xxxxx/sug/'
> DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Startpoint
> found!
> DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Checking
> 'phs.txt' against '/phs.txt'
> DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: No match!
>
>
> The third line above tells "phs.txt" does not match "/phs.txt".
>
> Well, I feel it is kind of hard for users to find out you need a slash.
> If this specification is going to be kept, I think it would be nice to
> specify this rule in the user documentation.
>
> Thanks for your help.
>
>
> Regards,
>
>
> Shigeki
>
> 2012/9/11 Karl Wright <[email protected]>
>>
>> I am wondering if there might be another locale-specific toLowerCase()
>> issue like we saw in Turkey...
>>
>> I've asked Shigeki to turn on connector debugging and send us the log.
>>  That should demonstrate if the rule is not matching due to case
>> reasons.
>>
>> Karl
>>
>> On Tue, Sep 11, 2012 at 7:44 AM, Ahmet Arslan <[email protected]> wrote:
>> > Hi Shigeki
>> >
>> > Can you try entering "*text.txt" in the text box?
>> >
>> > Ahmet
>> > --- On Tue, 9/11/12, Shigeki Kobayashi
>> > <[email protected]> wrote:
>> >
>> > From: Shigeki Kobayashi <[email protected]>
>> > Subject: Rules of excluding specific files in Windows file server are
>> > not recognized
>> > To: [email protected]
>> > Date: Tuesday, September 11, 2012, 1:46 PM
>> >
>> > Hi guys.
>> > I need some help in excluding specific files from crawling.
>> > I am trying to crawl Windows file server using Windows shares connector
>> > to index to Solr.
>> >
>> > There are some files I do not want to index so I set paths to exclude
>> > them from crawling, but the job crawls them.
>> > For example, I do NOT want to index "text.txt" in a directory D which is
>> > a root path.
>> >
>> >
>> > In "Paths" tab: - Set D as the root path.  - To create crawling rules,
>> > from pulldown, chose "exclude" and "file", and enter "text.txt" in a text
>> > box.
>> >
>> > - The list of crawling rules is created as following:
>> >   1. Exclude file(s) matching text.txt   2. Include indexable file(s)
>> > matching *  3. Include directory(s) matching *
>> >
>> >
>> > - Save the job setting
>> > As the result, the job still tries to crawl the file.I wonder why
>> > "text.txt" does not match in the crawling rule.
>> >
>> >
>> > Anyone knows what I did wrong?
>> > Version:  MCF 0.5  Solr 3.5  MySql 5.5
>> >
>> > Regards,
>> > Shigeki
>> >
>> >
>
>
>
>
> --
> 〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜
>  ソフトバンクモバイル株式会社
>  情報システム本部
>  システムサービス事業統括部
>  サービス企画部
>
>  小林 茂樹
>  [email protected]
> 〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜
>
>
>

Reply via email to