SuggestComponent in distributed (SolrCloud) environment

2014-10-09 Thread Frank Wesemann
Hi,
I'm about to integrate the SuggestCompont in our application and noticed
some behavior I didn't expect. My Solr version Solr 4.9.

1. The component returns common terms shards-n times.
2. Due to how the suggestions from each shard are collected, the
"exactMatchFirst" Parameter on the LookupImpl is practically ignored.

3. At least the Jaspell Lookup returns terms from deleted documents.

Is this expected behavior or am I missing something?
My config is quiet "defaulty" :

  
  FSTLookupFactory
  fst_mit_threshold
  HighFrequencyDictionaryFactory
  0.007
  suggestions/
  true
  suggest_context
  suggestContextAnalyzer

  

  

  true
  default
  fst_mit_threshold
  20
  /suggest
  json


  suggest

  

After a very short glimpse at the sources I think the two first issues
should be resolvable by plugging an other Queue implementation into
SuggestComponents finishStage()

I am quite unsure about no 3. At last these are suggestions, so nobody
guarantees to have results for the suggested terms, but it feels a little
strange from the users point of view.

Any thoughts on this?
If anybody is interested, I can open an Issue in JIRA and work on 1 and 2.


-- 
-- 
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky


Re: DIH full-import - when is commit() actally triggered?

2011-07-19 Thread Frank Wesemann

Ahmet Arslan schrieb:

I am running a full import with a quite plain data-config
(a root entity with three sub entities ) from a jdbc
datasource.
This import is expected to add approximately 10 mio
documents
What I now see from my logfiles is, that a newSearcher
event is fired about every five seconds.



This is triggered by autoCommit in every 300,000 milli seconds.
You need to remove 30 to disable this mechanism.


  

Thanks Ahmet,
indeed I had to remove the  Entry. So now a commit happens only 
every five minutes.


--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





DIH full-import - when is commit() actally triggered?

2011-07-15 Thread Frank Wesemann

Hello,
I am running a full import with a quite plain data-config (a root entity 
with three sub entities ) from a jdbc datasource.

This import is expected to add approximately 10 mio documents
What I now see from my logfiles is, that a newSearcher event is fired 
about every five seconds.

This causes a lot load on the machine.
While searching "*:*" via the admin interface it appears, that on every 
new commit about 1.000 docs are newly added.
This the "batchSize" I configured in the datasource definition, but I 
don't think that this related.

in solrconfig I have


   10
   
   10  
   30




What other parameters in solrconfig.xml or in my data-config may be 
related to this behaviour?

Any hint is appreciated.

Thanks
frank

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Error using Custom Functions for DIH

2011-07-11 Thread Frank Wesemann

Aviraj Chavan schrieb:

public class PrepareQuery extends Evaluator
{
@Override
public String evaluate(VariableResolver arg0, String arg1) {
String subQueryStr = "select ID, CATEGORY_NAME from 
CATEGORY_MASTER where ID=" + arg1;
return subQueryStr.toString();
}
}
  
As stated in the wiki and the javadoc the signature of the evaluate 
Methode is:


public String evaluate(String expression, Context context) 



http://wiki.apache.org/solr/DataImportHandler offers more info on this 
subject.





Thanks
Aviraj


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: DIH abort doesn't close datasources

2011-06-16 Thread Frank Wesemann

Shalin,
thank you for the answer.
I indeed didn't look into clearCache().
I thought it would just do that ( clear caches ). :)

Shalin Shekhar Mangar schrieb:

The abort command just sets a atomic boolean flag which is checked
frequently by the import threads to see if they should stop. If you look at
the DataImport.java's doFullImport or doDeltaImport methods, you'll see that
config.clearCaches is the clean up method which is called in a finally
block. So the data sources should be closed once the import actually aborts.
Note that there may be a time lag between calling the abort method and the
import actually getting abort if the import threads are waiting for I/O.

  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Multiple indexes

2011-06-15 Thread Frank Wesemann

You'll configure multiple cores:
http://wiki.apache.org/solr/CoreAdmin

Hi.

How to have multiple indexes in SOLR, with different fields and
different types of data?

Thank you very much!
Bye.
  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





DIH abort doesn't close datasources

2011-06-15 Thread Frank Wesemann

Hi,
I just came across this:
If I abort an import via /dataimport/?command=abort the connections to 
the (in my case) database stay open.
Shouldn't DocBuilder#rollback() call something like cleanup() which in 
turn tries to close EntityProcessors, Datasources etc.

instead of relying that finalize() will sometimes do it's job?


--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Solr CoreAdmin create ignores dataDir Parameter

2010-09-13 Thread Frank Wesemann

MitchK schrieb:

Frank,

have a look at SOLR-646.

Do you think a workaround for the data-dir-tag in the solrconfig.xml can
help?
I think about something like ${solr./data/corename} for
illustration.

Unfortunately I am not very skilled in working with solr's variables and
therefore I do not know what variables are available. 
  

No, variables are not available at this stage.

If we find a solution, we should provide it as a suggestion at the wiki's
CoreAdmin-page.

Kind regards,
- Mitch
  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Solr CoreAdmin create ignores dataDir Parameter

2010-09-10 Thread Frank Wesemann

Mark Miller schrieb:



I think so - what version of Solr are you using? I believe I've changed
this on trunk a few months ago.

  
We are running 1.4.2 and I looked in the solr/tags/release-1.4.1 branch 
of SVN.
The Version in trunk I can see is from 27.07.2010 and this also reads 
first config and than the CoreDescriptor.

I added a command to SOLR-1905 regarding this issue.


- Mark
  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Solr CoreAdmin create ignores dataDir Parameter

2010-09-10 Thread Frank Wesemann

Hello,
if I am trying to create a new SolrCore based on an extisting one via 
the CoreAdmin HTTP API,


http://localhost:8983/solr/admin/cores?action=CREATE&name=newCore&instanceDir=old_instance&schema=newSchema.xml&dataDir=newdata 
<http://localhost:8983/solr/admin/cores?action=CREATE&name=coreX&instanceDir=path_to_instance_directory&config=config_file_name.xml&schema=schem_file_name.xml&dataDir=data>


the dataDir parameter is ignored.
Instead the dataDir from the solrconfig.xml is taken in account.

I had a look at the Sources and saw that the CoreContainer's create() 
method,
calls the SolrCore Construktor with an dataDir value of "null", which 
leads to a dataDir primarily read from the config and not from the 
CoreDescriptior.


Shouldn't the CoreDescriptor, being more specific, take precedence over 
the config?


--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





inconsistency in SolrParams.get()

2010-05-04 Thread Frank Wesemann

Dear list,
I recently stumpled upon this:

modifiableParams = new ModifiableSolrParams( req.getParams() );

assert modifiableParams.get("key").equals( req.getParams().get("key") );

this test fails for requests built from a SimpleRequestParser or 
StandardRequestParser where the parameter "key" was given, but empty ( 
e.g. localhost:8393/select/?key=¶1=val1&parm2=val2 ).


The reason is that oas.request.ServletSolrParams returns null for values 
with length() == 0,

but all other SolrParams implementations return the empty String.

This behaviour has also side effects on search components:
Most, if not all, standard search components check for something like

if (reg.getParams().getBool(myTriggerParameter, false) ) {

   ...do what I am supposed to do...

}


In case of ServletSolrParams getBool() returns the desired and expected 
"false",

all other Implementations throw a "bad request" Exception.
One may argue that suppling a parameter with an empty value indeed is a 
malformed request,
but as an example, in our frontend servers we use a Perllib which always 
adds the "q" parameter to a SolrRequest

( and our Solr implementation allows  requests without a explicit query ).

Nonetheless I think, the above mentioned equality check should hold true 
for any request and any SolrParams.
Because I cannot oversee the implications, I currently don't have a 
better suggestion to achieve this, than
to make ServleSolrParams also return the empty String, which is in my 
opinion counter-intuitive and does not the right thing for the 
getBool(), getInt() etc. cases.

Any thoughts?

btw:
is ModifiableSolrParams.set(key, null) removes key from the params 
really the desired and expected behaviour?




--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Re: Solr under tomcat - UTF-8 issue

2010-01-22 Thread Frank Wesemann

Glock, Thomas schrieb:


My flex client httpservice by default only sets the content-type request header to  "application/x-www-form-urlencoded"  what it needed to do for tomcat is set the content-type request header to content-type = "application/x-www-form-urlencoded; charset=UTF-8"; 



  
As some browsers do not send this particular content-type correctly ( at 
least Firefox and Safari skip the "charset=utf-8" part),

I added a servlet.Filter :

public class RequestCharset2utf8Filter implements javax.servlet.Filter {
...
public void doFilter(ServletRequest req, ServletResponse res, 
FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter( req, res);
}
}

as the first filter to my webapp:
in web.xml:

 
 CharsetEncodingFilter
 my.package.servlet.RequestCharset2utf8Filter
 
 
  CharsetEncodingFilter
  /*
 


I run it on tomcat 6.0.18 .

And:
wonder is of course right, but life isn't all beer and skittles.

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky