Re: question about multiple languages

2012-10-09 Thread Erlend Garåsen
On 08.10.12 17.03, Maciej Liżewski wrote: Now there are two possibilities: 1. when fields are untouched - processing data (stemming, etc) is same for every document, which is rather wrong because polish stemming is different from english one... :) 2. attributes are mapped to *_lang and every

Re: question about multiple languages

2012-10-09 Thread Maciej Liżewski
Google does a guessing about the query language. If you hit www.google.com, you will be redirected to www.google.pl if you're sitting in Poland. This may also be achieved in your application by detecting the browser's locale etc. Many web application frameworks have support for this. Then you

Re: getMaxDocumentRequest problem

2012-10-09 Thread Maciej Liżewski
Ok... it is not a getMaxDocumentRequest issue, because I was able to get it even with getMaxDocumentRequest=1. Seems it occurs when indenxing large sets of documents (in my case ~7000). It also happened once for CIFS connecotr (with samba share)... result is like this: NameStatus Start Time

Re: getMaxDocumentRequest problem

2012-10-09 Thread Karl Wright
What is your deployment model? Is this a multiprocess deployment? What database are you using? There are various load tests for each database, which do far more than 7000 documents. I am concerned that you are seeing this because of some kind of cross-process synchronization issues, which might

Re: getMaxDocumentRequest problem

2012-10-09 Thread Karl Wright
FWIW, getting thread dumps from the process running the agents process when it is hung may (or may not) help determine the underlying clause. Karl On Tue, Oct 9, 2012 at 9:21 AM, Karl Wright daddy...@gmail.com wrote: What is your deployment model? Is this a multiprocess deployment? What

[jira] [Created] (CONNECTORS-551) Documents from Wiki and JDBC connectors are removed whenever the agents process is started

2012-10-09 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-551: -- Summary: Documents from Wiki and JDBC connectors are removed whenever the agents process is started Key: CONNECTORS-551 URL:

Re: question about multiple languages

2012-10-09 Thread Maciej Liżewski
Thanks Erlen for your hints! 2012/10/9 Erlend Garåsen e.f.gara...@usit.uio.no: On 09.10.12 14.19, Maciej Liżewski wrote: Google does a guessing about the query language. If you hit www.google.com, you will be redirected to www.google.pl if you're sitting in Poland. This may also be

Re: getMaxDocumentRequest problem

2012-10-09 Thread Maciej Liżewski
Just looking at threads... but nothing special. - all worker threads are gone, - stuffer thread runs in a loop but finds nothing to do... - other threads just waits on 'sleep' commands. is there any particular thread I should look at? I could guess that there was some exception (maybe in my

[jira] [Commented] (CONNECTORS-551) Documents from Wiki and JDBC connectors are removed whenever the agents process is started

2012-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472405#comment-13472405 ] Karl Wright commented on CONNECTORS-551: I tried this with a JDBC connection

Re: getMaxDocumentRequest problem

2012-10-09 Thread Karl Wright
- all worker threads are gone, ??? Really?? I can think of no scenario where the worker threads disappear except if shutdown of the agents process is attempted and fails. WorkerThread catches all Throwable's and logs them and repeats in a loop. The only kind of exception that can cause the

Re: getMaxDocumentRequest problem

2012-10-09 Thread Karl Wright
What JVM are you using? Because frankly this cannot logically happen. The only other possibility is that your code is somehow throwing ManifoldCFExceptions of type ManifoldCFException.INTERRUPTED. Karl On Tue, Oct 9, 2012 at 10:20 AM, Maciej Liżewski maciej.lizew...@gmail.com wrote:

Re: getMaxDocumentRequest problem

2012-10-09 Thread Maciej Liżewski
One more thing: now, running other job connected with same repository also hangs after seeding and no worker threads are spawn... 2012/10/9 Maciej Liżewski maciej.lizew...@gmail.com: 2012/10/9 Karl Wright daddy...@gmail.com: - all worker threads are gone, ??? Really?? yes... really..

Re: getMaxDocumentRequest problem

2012-10-09 Thread Karl Wright
ManifoldCFException.INTERRUPTED should only be thrown if thread has been asked to shut down. It is basically the equivalent of InterruptedException in the ManifoldCF world. If the thread sees it it exits immediately. Karl On Tue, Oct 9, 2012 at 10:29 AM, Maciej Liżewski

Re: New committer: Ahmet Arslan

2012-10-09 Thread Ahmet Arslan
Hi All, Thanks for the warm welcome. I live in Eskisehir Turkey, currently doing PhD in the area of Information Retrieval. My open source journey started in 2008 with solr/lucene and continued with some other. After some time I realised that I enjoy/love open source and I need to part of

Re: getMaxDocumentRequest problem

2012-10-09 Thread Maciej Liżewski
Ok - it was the case... there are some exceptions which I mapped to ManifoldCFException with INTERRUPTED flag (I just copied it from some other connector) and thay were killing workers... Now those exceptions are exposed in job statuses as reasons why they failed. thanks for hints and your help.

[jira] [Created] (CONNECTORS-552) Forced solr attributes in job specification and/or configuration

2012-10-09 Thread Maciej Lizewski (JIRA)
Maciej Lizewski created CONNECTORS-552: -- Summary: Forced solr attributes in job specification and/or configuration Key: CONNECTORS-552 URL: https://issues.apache.org/jira/browse/CONNECTORS-552

[jira] [Commented] (CONNECTORS-552) Forced solr attributes in job specification and/or configuration

2012-10-09 Thread Ahmet Arslan (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472700#comment-13472700 ] Ahmet Arslan commented on CONNECTORS-552: - bq. I have three document sources:

[jira] [Commented] (CONNECTORS-551) Documents from Wiki and JDBC connectors are removed whenever the agents process is started when continuous crawling

2012-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472715#comment-13472715 ] Karl Wright commented on CONNECTORS-551: Problem turned out to be a map that

[jira] [Resolved] (CONNECTORS-551) Documents from Wiki and JDBC connectors are removed whenever the agents process is started when continuous crawling

2012-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-551. Resolution: Fixed Documents from Wiki and JDBC connectors are removed whenever

[jira] [Commented] (CONNECTORS-552) Forced solr attributes in job specification and/or configuration

2012-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472725#comment-13472725 ] Karl Wright commented on CONNECTORS-552: Each output connector has the ability

[PROPOSAL] Release a ManifoldCF 1.0.1 release

2012-10-09 Thread Karl Wright
Hi folks, Due to the potential severity of CONNECTORS-551, I think it might be a good idea to release a ManifoldCF 1.0.1 release which contains the fix for this ticket. Please can I have a show of hands as to whether people agree that this is serious enough to warrant such a release. Thanks!