Hi Shigeki, The issue is that the "file part" must match whatever is left after the start point part of the path matches. So, the file part will always begin with a "/".
There are two things we could do: (1) Document it, or (2) change it (by removing the starting "/"). But remember that if there is a path before the filename it will also look funny, e.g.: /my/path/and/file.txt would become my/path/and/file.txt Furthermore, if we change the behavior, maybe some peoples' jobs don't work right anymore. I will open a ticket to track discussion of this issue. CONNECTORS-526. Thanks, Karl On Tue, Sep 11, 2012 at 10:11 PM, Shigeki Kobayashi <[email protected]> wrote: > I found the reason that MCF job does not recognize the file name to exclude > from crawling. > You need to put a slash character follwoing by a file name. > > I obtained a log below. This time I had a root directory, > //xxxxx/SharePrjG2/xxxxx/sug/, then placed a file named as "phs.txt". > In the job setting, I entered "phs.txt" to exclude the file from crawling, > so the crawling rule became as follwoing: > > 1. Exclude file(s) matching phs.txt > > DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Matching > startpoint 'smb://xxxxx/SharePrjG2/xxxxx/sug/' against actual > 'smb://xxxxx/SharePrjG2/xxxxx/sug/' > DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Startpoint > found! > DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Checking > 'phs.txt' against '/phs.txt' > DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: No match! > > > The third line above tells "phs.txt" does not match "/phs.txt". > > Well, I feel it is kind of hard for users to find out you need a slash. > If this specification is going to be kept, I think it would be nice to > specify this rule in the user documentation. > > Thanks for your help. > > > Regards, > > > Shigeki > > 2012/9/11 Karl Wright <[email protected]> >> >> I am wondering if there might be another locale-specific toLowerCase() >> issue like we saw in Turkey... >> >> I've asked Shigeki to turn on connector debugging and send us the log. >> That should demonstrate if the rule is not matching due to case >> reasons. >> >> Karl >> >> On Tue, Sep 11, 2012 at 7:44 AM, Ahmet Arslan <[email protected]> wrote: >> > Hi Shigeki >> > >> > Can you try entering "*text.txt" in the text box? >> > >> > Ahmet >> > --- On Tue, 9/11/12, Shigeki Kobayashi >> > <[email protected]> wrote: >> > >> > From: Shigeki Kobayashi <[email protected]> >> > Subject: Rules of excluding specific files in Windows file server are >> > not recognized >> > To: [email protected] >> > Date: Tuesday, September 11, 2012, 1:46 PM >> > >> > Hi guys. >> > I need some help in excluding specific files from crawling. >> > I am trying to crawl Windows file server using Windows shares connector >> > to index to Solr. >> > >> > There are some files I do not want to index so I set paths to exclude >> > them from crawling, but the job crawls them. >> > For example, I do NOT want to index "text.txt" in a directory D which is >> > a root path. >> > >> > >> > In "Paths" tab: - Set D as the root path. - To create crawling rules, >> > from pulldown, chose "exclude" and "file", and enter "text.txt" in a text >> > box. >> > >> > - The list of crawling rules is created as following: >> > 1. Exclude file(s) matching text.txt 2. Include indexable file(s) >> > matching * 3. Include directory(s) matching * >> > >> > >> > - Save the job setting >> > As the result, the job still tries to crawl the file.I wonder why >> > "text.txt" does not match in the crawling rule. >> > >> > >> > Anyone knows what I did wrong? >> > Version: MCF 0.5 Solr 3.5 MySql 5.5 >> > >> > Regards, >> > Shigeki >> > >> > > > > > > -- > 〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜 > ソフトバンクモバイル株式会社 > 情報システム本部 > システムサービス事業統括部 > サービス企画部 > > 小林 茂樹 > [email protected] > 〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜〜 > > >
