Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am missing three
 goals. I don't think I can reach them just via xml conf or transformer and
 sqlEntitProcessor plugin. But need to be sure of that.
 If there's no other way I will hack some solr source classes, would like to
 know the best way to do that. Once I have it solved, I can upload or post
 the source in the forum in case someone think it can be helpful.

 1.- Every time I execute dataimporthandler (to index data from a db), at the
 start time or end time I need to delete some expired documents. I have to
 delete them from the database and from the index. I know wich documents must
 be deleted because of a field in the db that says it. Would not like to
 delete first all from DB or first all from index but one from index and one
 from doc every time.

You can override the init() destroy() of the SqlEntityProcessor and
use it as the processor for the root entity. At this point you can run
the necessary db queries and solr delete queries . look at
Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I could
 use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not
 find so much information about how to make it work. As deltaQuery modifies
 docs (delete old and insert new) I supose it must be a easy way to do this
 just doing the delete and not the new insert.
deletedPkQuery does everything first. it runs the query and uses that
to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and modify/add
 all documents from db wich were inserted after that date. What I want is to
 be able to save in the field the id of the last idexed doc. So in the next
 time I ejecute the indexer make it start indexing from that last indexed id
 doc.
You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 The point of doing this is that if I do a full import from a db with lots of
 rows the app could encounter a problem in the middle of the execution and
 abort the process. As deltaquey works I would have to restart the execution
 from the begining. Having this new functionality I could optimize the index
 and start from the last indexed doc.
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID control

 3.-I commented before about this last point. I want to give boost to doc
 fields at indexing time.
Adding fieldboost is a planned item.

It must work as follows .
Add a special value $fieldBoost.fieldname to the row map

And DocBuilder should respect that. You can raise a bug and we can
commit it soon.
 How can I do to rise a bug?
https://issues.apache.org/jira/secure/CreateIssue!default.jspa

 Thanks in advance




 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Marc Sturlese

Thanks I really apreciate your help.

I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add
 all documents from db wich were inserted after that date. What I want is
 to
 be able to save in the field the id of the last idexed doc. So in the next
 time I ejecute the indexer make it start indexing from that last indexed
 id
 doc.
You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

When I said:

 be able to save in the field the id of the last idexed doc
I made a mistake, wanted to mean :

be able to save in the file (dataimport.properties) the id of the last
indexed doc.
The point would be to do my own deltaquery indexing from the last doc
indexed id instead of the timestamp.
So I think this would not work in that case (it's my mistake because of the
bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

It is because I was saying:
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID control 

I am in the correct direction?
 Sorry for my englis and thanks in advance


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am missing
 three
 goals. I don't think I can reach them just via xml conf or transformer
 and
 sqlEntitProcessor plugin. But need to be sure of that.
 If there's no other way I will hack some solr source classes, would like
 to
 know the best way to do that. Once I have it solved, I can upload or post
 the source in the forum in case someone think it can be helpful.

 1.- Every time I execute dataimporthandler (to index data from a db), at
 the
 start time or end time I need to delete some expired documents. I have to
 delete them from the database and from the index. I know wich documents
 must
 be deleted because of a field in the db that says it. Would not like to
 delete first all from DB or first all from index but one from index and
 one
 from doc every time.
 
 You can override the init() destroy() of the SqlEntityProcessor and
 use it as the processor for the root entity. At this point you can run
 the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)
 
 
 The delete mark is setted as an update in the db row so I think I could
 use deltaImport. Don't know If deletedPkQuery is the way to do that. Can
 not
 find so much information about how to make it work. As deltaQuery
 modifies
 docs (delete old and insert new) I supose it must be a easy way to do
 this
 just doing the delete and not the new insert.
 deletedPkQuery does everything first. it runs the query and uses that
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add
 all documents from db wich were inserted after that date. What I want is
 to
 be able to save in the field the id of the last idexed doc. So in the
 next
 time I ejecute the indexer make it start indexing from that last indexed
 id
 doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row
 
 The point of doing this is that if I do a full import from a db with lots
 of
 rows the app could encounter a problem in the middle of the execution and
 abort the process. As deltaquey works I would have to restart the
 execution
 from the begining. Having this new functionality I could optimize the
 index
 and start from the last indexed doc.
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID control

 3.-I commented before about this last point. I want to give boost to doc
 fields at indexing time.
Adding fieldboost is a planned item.

It must work as follows .
Add a special value $fieldBoost.fieldname to the row map

And DocBuilder should respect that. You can raise a bug and we can
commit it soon.
 How can I do to rise a bug?
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa

 Thanks in advance




 --
 View this message in context:
 http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20790542.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PHPResponseWriter problem with (de)serialization (spellchecker)

2008-12-02 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 6:39 AM, Steffen B. [EMAIL PROTECTED] wrote:
 Little update: this behaviour can be easily reproduced with the example
 configuration that comes with Solr:
 After uncommenting line 733 in
 apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the
 PHPS queryResponseWriter) loading this URL on the example index shows the
 same problem:
 http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps

 [...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}}

 As I'm no Java crack and have neither time nor knowledge to debug the class
 myself, I can only offer to (re)open a task in Jira. Any opinions on this?

Hi Steffen, was there previosly JIRA issue already open specifically
for this issue?  If not, please do open a new issue.

-Yonik


Re: PHPResponseWriter problem with (de)serialization (spellchecker)

2008-12-02 Thread Steffen B.

No, this issue is new. But there was a general PHPResponseWriter task...
I opened a new one: https://issues.apache.org/jira/browse/SOLR-892
Feel free to move / edit / merge it. ;) I hope I made the problem clear.
~ Steffen


Yonik Seeley wrote:
 
 On Tue, Dec 2, 2008 at 6:39 AM, Steffen B. [EMAIL PROTECTED]
 wrote:
 Little update: this behaviour can be easily reproduced with the example
 configuration that comes with Solr:
 After uncommenting line 733 in
 apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the
 PHPS queryResponseWriter) loading this URL on the example index shows the
 same problem:
 http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps

 [...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}}

 As I'm no Java crack and have neither time nor knowledge to debug the
 class
 myself, I can only offer to (re)open a task in Jira. Any opinions on
 this?
 
 Hi Steffen, was there previosly JIRA issue already open specifically
 for this issue?  If not, please do open a new issue.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/PHPResponseWriter-problem-with-%28de%29serialization-%28spellchecker%29-tp20703677p20792680.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PHPResponseWriter problem with (de)serialization (spellchecker)

2008-12-02 Thread Steffen B.

Little update: this behaviour can be easily reproduced with the example
configuration that comes with Solr:
After uncommenting line 733 in
apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the
PHPS queryResponseWriter) loading this URL on the example index shows the
same problem:
http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps

[...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}}

As I'm no Java crack and have neither time nor knowledge to debug the class
myself, I can only offer to (re)open a task in Jira. Any opinions on this?

~ Steffen


Steffen B. wrote:
 
 Hi everyone,
 maybe it's just me, but whenever I try to deserialize a Solr response that
 contains the spellchecker with spellcheck.extendedResults, it fails. I'm
 using PHP5 and everthing is pretty much up-to-date.
 lst name=spellcheck
   lst name=suggestions
 bool name=correctlySpelledtrue/bool
   /lst
 /lst
 will be converted to
 [...] 
 s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}}
 which is not deserializeable with unserialize():
 Notice: unserialize() [function.unserialize]: Error at offset 305 of 312
 bytes in /Solr/Client.php on line 131
 
 PHP, on the other hand, serializes an array this way:
 echo
 serialize(array(spellcheck=array(suggestions=array(correctlySpelled=true;
 to
 a:1:{s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;b:1;}}}
 
 So there are obviously differences in the way boolean vars are converted,
 though I'm not sure if it's a problem with the PHPResponseWriter or with
 my setup. Can anyone confirm this behaviour?
 
 

-- 
View this message in context: 
http://www.nabble.com/PHPResponseWriter-problem-with-%28de%29serialization-%28spellchecker%29-tp20703677p20790504.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dealing with field values as key/value pairs

2008-12-02 Thread Stephane Bailliez


Yeh, sorry was not clear in my question. Storage would end up being done 
the same way of course


I guess I'm more looking for feedback about what people have used as a 
strategy to handle this type of situation. This goes for faceting as well.


Assuming I do faceting by author and there is 2 authors with the same 
name. Does not work right.


So discovering hot water, here, the facet value is best expressed with 
identifiers which would uniquely identify your author. Then you lose the 
'name' and you need to effectively get it.


But if you want to effectively also offer the ability to offer the name 
of the author in your solr response in a 'standalone' way (ie: don't 
rely an another source of data, like the db where is stored that 
mapping) ...then you need to store this data in a convenient form in the 
index to be able to access it later.


So i'm basically looking for design pattern/best practice for that 
scenario based on people's experience.



I was also thinking about storing each values into dynamic fields such 
as 'metadata_field_identifier' and then assuming I have a facet 
'facet_field' which stores identifiers,  use a search component to 
provide the mapping as an 'extra' in the response  and give the mapping 
in another section of the response (similar to the debug, facets, etc)


ie: something like:
mapping: {

  'field1': { 'identifier1': 'value1', 'identifier2': 'value2' },

  'field2': { 'identifierx': 'valuex', 'identifiery': 'valuey' }
}

does that make sense ?

-- stephane

Noble Paul നോബിള്‍ नोब्ळ् wrote:

In the end lucene stores stuff as strings.

Even if you do store your data as map FieldType , Solr May not be able
to treat it like a map.
So it is fine to put is the map as one single string

On Mon, Dec 1, 2008 at 10:07 PM, Stephane Bailliez [EMAIL PROTECTED] wrote:

Hi all,


I'm looking for ideas about how to best deal with a situation where I need
to deal with storing key/values pairs in the index for consumption in the
client.


Typical example would be to have a document with multiple genres where for
simplicity reasons i'd like to send both the 'id' and the 'human readable
label' (might not be the best example since one would immediatly say 'what
about localization', but in that case assume it's an entity such as company
name or a person name).

So say I have

field1 = { 'key1':'this is value1', 'key2':'this is value2' }


I was thinking the easiest (not the prettiest) solution would be to store it
as effectively a string 'key:this is the value' and then have the client
deal with this 'format' and then parse it based on 'key:value' pattern

Another alternative I was thinking may have been to use a custom field that
effectively would make the field value as a map key/value for the writer but
I'm not so sure it can really be done, haven't investigated that one deeply.

Any feedback would be welcome, solution might even be simpler and cleaner
than what I'm mentioning above, but my brain is mushy in the last couple of
weeks.

-- stephane










Re: Problem indexing on Oracle DB

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
cool

The only problem is that java.sql.Clob#getCharacterStream() is package
private and you have to use the oracle.sql.CLOB



On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED] wrote:
 Thanks for your reply!

 I wrote such a transformer and now it seems to work perfectly. Here's the
 code for the transformer if anyone encounters the same problem, or if anyone
 want to improve it:

 import org.apache.solr.handler.dataimport.*;
 import oracle.sql.CLOB;
 import java.util.*;
 import java.io.*;

 public class ClobTransformer extends Transformer
 {
public MapString, Object transformRow(MapString, Object row, Context
 context)
{
ListMapString, String fields = context.getAllEntityFields();
for (MapString, String field : fields)
{
String toString = field.get(toString);
if (true.equals(toString))
{
String columnName = field.get(column);
CLOB clob = (CLOB)row.get(columnName);
if (clob != null)
{
StringBuffer strOut = new StringBuffer();
String app;
try {
BufferedReader br = new
 BufferedReader(clob.getCharacterStream());
while ((app=br.readLine())!=null)
strOut.append(app);
} catch (Exception e) { e.printStackTrace(); }

row.put(columnName, strOut.toString());
}
}
}
return row;

}
 }

 // Joel

 2008/12/2 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]

 Hi Joel,
 DIH does not translate Clob automatically to text.

 We can open that as an issue.
 meanwhile you can write a transformer of your own to read Clob and
 convert to text.

 http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688


 On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED]
 wrote:
  Thanks for your reply!
 
  I'm already using the DataImportHandler for indexing. Do I still have to
  convert the Clob myself or are there any built-in functions that I've
  missed?
 
  // Joel
 
 
  2008/12/1 Yonik Seeley [EMAIL PROTECTED]
 
  If you are querying Oracle yourself and using something like SolrJ,
  then you must convert the Clob yourself into a String representation.
 
  Also, did you look at Solr's DataImportHandler?
 
  -Yonik
 
  On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED]
  wrote:
   Hello everyone,
  
   I'm trying to index on an Oracle DB, but can't seem to find any built
 in
   support for objects of type oracle.sql.Clob. The field I try to put
 the
  data
   into is of type text, but after indexing it only contains the
  Clob-objects
   string representation, i.e. something like [EMAIL PROTECTED]
  who
   knows how to get Solr to index the content of these objects rather
 than
  its
   string representation??
  
   Thanks in advance! // Joel
  
 
 



 --
 --Noble Paul





-- 
--Noble Paul


Re: Problem indexing on Oracle DB

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
no probs We can fix that using reflection. I shall give a patch w/ that.
Probably it is better to fix it in a Transformer



On Tue, Dec 2, 2008 at 1:56 PM, Joel Karlsson [EMAIL PROTECTED] wrote:
 True, but perhaps it works with java.sql.Clob as well, haven't tried it
 though.

 2008/12/2 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]

 cool

 The only problem is that java.sql.Clob#getCharacterStream() is package
 private and you have to use the oracle.sql.CLOB



 On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED]
 wrote:
  Thanks for your reply!
 
  I wrote such a transformer and now it seems to work perfectly. Here's the
  code for the transformer if anyone encounters the same problem, or if
 anyone
  want to improve it:
 
  import org.apache.solr.handler.dataimport.*;
  import oracle.sql.CLOB;
  import java.util.*;
  import java.io.*;
 
  public class ClobTransformer extends Transformer
  {
 public MapString, Object transformRow(MapString, Object row,
 Context
  context)
 {
 ListMapString, String fields = context.getAllEntityFields();
 for (MapString, String field : fields)
 {
 String toString = field.get(toString);
 if (true.equals(toString))
 {
 String columnName = field.get(column);
 CLOB clob = (CLOB)row.get(columnName);
 if (clob != null)
 {
 StringBuffer strOut = new StringBuffer();
 String app;
 try {
 BufferedReader br = new
  BufferedReader(clob.getCharacterStream());
 while ((app=br.readLine())!=null)
 strOut.append(app);
 } catch (Exception e) { e.printStackTrace(); }
 
 row.put(columnName, strOut.toString());
 }
 }
 }
 return row;
 
 }
  }
 
  // Joel
 
  2008/12/2 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]
 
  Hi Joel,
  DIH does not translate Clob automatically to text.
 
  We can open that as an issue.
  meanwhile you can write a transformer of your own to read Clob and
  convert to text.
 
 
 http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688
 
 
  On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED]
  wrote:
   Thanks for your reply!
  
   I'm already using the DataImportHandler for indexing. Do I still have
 to
   convert the Clob myself or are there any built-in functions that I've
   missed?
  
   // Joel
  
  
   2008/12/1 Yonik Seeley [EMAIL PROTECTED]
  
   If you are querying Oracle yourself and using something like SolrJ,
   then you must convert the Clob yourself into a String representation.
  
   Also, did you look at Solr's DataImportHandler?
  
   -Yonik
  
   On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED]
 
   wrote:
Hello everyone,
   
I'm trying to index on an Oracle DB, but can't seem to find any
 built
  in
support for objects of type oracle.sql.Clob. The field I try to put
  the
   data
into is of type text, but after indexing it only contains the
   Clob-objects
string representation, i.e. something like
 [EMAIL PROTECTED]
   who
knows how to get Solr to index the content of these objects rather
  than
   its
string representation??
   
Thanks in advance! // Joel
   
  
  
 
 
 
  --
  --Noble Paul
 
 



 --
 --Noble Paul





-- 
--Noble Paul


Re: Problem indexing on Oracle DB

2008-12-02 Thread Joel Karlsson
True, but perhaps it works with java.sql.Clob as well, haven't tried it
though.

2008/12/2 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]

 cool

 The only problem is that java.sql.Clob#getCharacterStream() is package
 private and you have to use the oracle.sql.CLOB



 On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED]
 wrote:
  Thanks for your reply!
 
  I wrote such a transformer and now it seems to work perfectly. Here's the
  code for the transformer if anyone encounters the same problem, or if
 anyone
  want to improve it:
 
  import org.apache.solr.handler.dataimport.*;
  import oracle.sql.CLOB;
  import java.util.*;
  import java.io.*;
 
  public class ClobTransformer extends Transformer
  {
 public MapString, Object transformRow(MapString, Object row,
 Context
  context)
 {
 ListMapString, String fields = context.getAllEntityFields();
 for (MapString, String field : fields)
 {
 String toString = field.get(toString);
 if (true.equals(toString))
 {
 String columnName = field.get(column);
 CLOB clob = (CLOB)row.get(columnName);
 if (clob != null)
 {
 StringBuffer strOut = new StringBuffer();
 String app;
 try {
 BufferedReader br = new
  BufferedReader(clob.getCharacterStream());
 while ((app=br.readLine())!=null)
 strOut.append(app);
 } catch (Exception e) { e.printStackTrace(); }
 
 row.put(columnName, strOut.toString());
 }
 }
 }
 return row;
 
 }
  }
 
  // Joel
 
  2008/12/2 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]
 
  Hi Joel,
  DIH does not translate Clob automatically to text.
 
  We can open that as an issue.
  meanwhile you can write a transformer of your own to read Clob and
  convert to text.
 
 
 http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688
 
 
  On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED]
  wrote:
   Thanks for your reply!
  
   I'm already using the DataImportHandler for indexing. Do I still have
 to
   convert the Clob myself or are there any built-in functions that I've
   missed?
  
   // Joel
  
  
   2008/12/1 Yonik Seeley [EMAIL PROTECTED]
  
   If you are querying Oracle yourself and using something like SolrJ,
   then you must convert the Clob yourself into a String representation.
  
   Also, did you look at Solr's DataImportHandler?
  
   -Yonik
  
   On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED]
 
   wrote:
Hello everyone,
   
I'm trying to index on an Oracle DB, but can't seem to find any
 built
  in
support for objects of type oracle.sql.Clob. The field I try to put
  the
   data
into is of type text, but after indexing it only contains the
   Clob-objects
string representation, i.e. something like
 [EMAIL PROTECTED]
   who
knows how to get Solr to index the content of these objects rather
  than
   its
string representation??
   
Thanks in advance! // Joel
   
  
  
 
 
 
  --
  --Noble Paul
 
 



 --
 --Noble Paul



Multi Language Search

2008-12-02 Thread tushar kapoor

Hi,

Before I start with Solr specific question, there is one thing I need to get
information on.

If I am a Russian user on a Russian Website  I want to search for indexes
having two Russian words how is the query term going to look like.

1. Russian Word 1 AND Russian Word 2

or rather,

2 . Russian Word 1 AND in Russian Russian Word 2

Now over to solr specific question. In case the answer to above is either 1.
or 2. how does one do it using Solr. I tried using the Language anallyzers
but I m not too sure how exactly it works.

Regards,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/Multi-Language-Search-tp20789025p20789025.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will
maintain back compat and your code will be usable w/ newer versions.


On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Thanks I really apreciate your help.

 I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add
 all documents from db wich were inserted after that date. What I want is
 to
 be able to save in the field the id of the last idexed doc. So in the next
 time I ejecute the indexer make it start indexing from that last indexed
 id
 doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 When I said:

 be able to save in the field the id of the last idexed doc
 I made a mistake, wanted to mean :

 be able to save in the file (dataimport.properties) the id of the last
 indexed doc.
 The point would be to do my own deltaquery indexing from the last doc
 indexed id instead of the timestamp.
 So I think this would not work in that case (it's my mistake because of the
 bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 It is because I was saying:
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID control

 I am in the correct direction?
  Sorry for my englis and thanks in advance


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am missing
 three
 goals. I don't think I can reach them just via xml conf or transformer
 and
 sqlEntitProcessor plugin. But need to be sure of that.
 If there's no other way I will hack some solr source classes, would like
 to
 know the best way to do that. Once I have it solved, I can upload or post
 the source in the forum in case someone think it can be helpful.

 1.- Every time I execute dataimporthandler (to index data from a db), at
 the
 start time or end time I need to delete some expired documents. I have to
 delete them from the database and from the index. I know wich documents
 must
 be deleted because of a field in the db that says it. Would not like to
 delete first all from DB or first all from index but one from index and
 one
 from doc every time.

 You can override the init() destroy() of the SqlEntityProcessor and
 use it as the processor for the root entity. At this point you can run
 the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I could
 use deltaImport. Don't know If deletedPkQuery is the way to do that. Can
 not
 find so much information about how to make it work. As deltaQuery
 modifies
 docs (delete old and insert new) I supose it must be a easy way to do
 this
 just doing the delete and not the new insert.
 deletedPkQuery does everything first. it runs the query and uses that
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add
 all documents from db wich were inserted after that date. What I want is
 to
 be able to save in the field the id of the last idexed doc. So in the
 next
 time I ejecute the indexer make it start indexing from that last indexed
 id
 doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 The point of doing this is that if I do a full import from a db with lots
 of
 rows the app could encounter a problem in the middle of the execution and
 abort the process. As deltaquey works I would have to restart the
 execution
 from the begining. Having this new functionality I could optimize the
 index
 and start from the last indexed doc.
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID control

 3.-I commented before about this last point. I want to give boost to doc
 fields at indexing time.
Adding fieldboost is a planned item.

It must work as follows .
Add a special value $fieldBoost.fieldname to the row map

And DocBuilder should respect that. You can raise a bug and we can
commit it soon.
 How can I do to rise a bug?
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa

 Thanks in advance




 --
 View this message in context:
 http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul



 --
 View this message in context: 
 

Solr 1.3 - response time very long

2008-12-02 Thread sunnyfr

Hi,

I tested my old search engine which is sphinx and my new one which solr and
I've got a uge difference of result.
How can I make it faster?

Thanks a lot,
-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.3 - response time very long

2008-12-02 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 12:04 PM, sunnyfr [EMAIL PROTECTED] wrote:
 How can I make it faster?

There's no -go-faster-please flag ;-)

Give us the exact URL and we might be able to help figure out what part is slow.

-Yonik


OOM on commit after few days

2008-12-02 Thread Sunil
I have been facing this issue since long in production environment and wanted 
to know if anybody came across can share their thoughts.
Appreciate your help.

Environment
2 GB index file
3.5 million documents
15 mins. time interval for committing 100 to 400 document updates
   Commit happens once in 15 mins. 
3.5 GB of RAM available for JVM
Solr Version 1.3 ; (nightly build of oct 18, 2008)

MDB - Message Driven Bean
I am Not using solr's replication mecahnism. Also don't use xml post update 
since the amount of data is too much.
I have bundled a MDB that receives messages for data updates and uses solr's 
update handler to update and commit the index.
Optimize happens once a day.
 
Everything runs fine for 2-3 days; after that I keep getting following 
exceptions.

Exception
org.apache.solr.common.SolrException log java.lang.OutOfMemoryError: 
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:350)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907)
at 
org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338)
at 
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
at java.lang.Thread.run(Thread.java:810)



Encoded search string qt=Dismax

2008-12-02 Thread tushar kapoor

Hi,

I am facing problems while searching for some encoded text as part of the
search query string. The results don't come up when I use some url encoding
with qt=dismaxrequest.

I am searching a Russian word by posting a URL encoded UTF8 transformation
of the word. The query works fine for normal request. However, no docs are
fetched when qt=dismaxrequest is appended as part of the query string.

The word being searched is -
Russian Word - Предварительное 

UTF8 Java Encoding -
\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435

Posted query string (URL Encoded) - 
%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Following are the two queries and the difference in results

Query 1 - this one works fine

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Result -

?xml version=1.0 encoding=UTF8 ? 
 response
 lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
 lst name=params
  str
name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str
 
  /lst
  /lst
 result name=response numFound=1 start=0
 doc
  str name=Index_Type_sproductIndex/str 
  str name=Index_Type_str_sproductIndex/str 
  str name=URL_s4100018/str 
  str name=URL_str_s4100018/str 
 arr name=all
  strproductIndex/str 
  strproduct/str 
  strПредварительное K математики учебная книга/str 
  str4100018/str 
  str4100018/str 
  str21125/str 
  str91048/str 
  str91047/str 
  /arr
  str name=editionTypeId_s21125/str 
  str name=editionTypeId_str_s21125/str 
 arr name=listOf_taxonomyPath
  str91048/str 
  str91047/str 
  /arr
  str name=prdMainTitle_sПредварительное K математики учебная
книга/str 
  str name=prdMainTitle_str_sПредварительное K математики учебная
книга/str 
  str name=productType_sproduct/str 
  str name=productType_str_sproduct/str 
 arr name=strlistOf_taxonomyPath
  str91048/str 
  str91047/str 
  /arr
  date name=timestamp20081202T08:14:05.63Z/date 
  /doc
  /result
  /response

Query 2 - qt=dismaxrequest - This doesnt work

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435qt=dismaxrequest

Result -
  ?xml version=1.0 encoding=UTF8 ? 
 response
 lst name=responseHeader
  int name=status0/int 
  int name=QTime109/int 
 lst name=params
  str
name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str
 
  str name=qtdismaxrequest/str 
  /lst
  /lst
  result name=response numFound=0 start=0 maxScore=0.0 / 
  /response

Dont know why there is a difference on appending qt=dismaxrequest. Any help
would be appreciated.


Regards,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-02 Thread wojtekpia

Is there a configurable way to switch to the previous implementation? I'd
like to see exactly how it affects performance in my case.


Yonik Seeley wrote:
 
 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:
 
 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: OOM on commit after few days

2008-12-02 Thread Yonik Seeley
Using embedded is always more error prone...you're probably forgetting
to close some resource.
Make sure to close all SolrQueryRequest objects.
Start with a memory profiler or heap dump to try and figure out what's
taking up all the memory.

-Yonik

On Tue, Dec 2, 2008 at 1:05 PM, Sunil [EMAIL PROTECTED] wrote:
 I have been facing this issue since long in production environment and wanted 
 to know if anybody came across can share their thoughts.
 Appreciate your help.

 Environment
 2 GB index file
 3.5 million documents
 15 mins. time interval for committing 100 to 400 document updates
   Commit happens once in 15 mins.
 3.5 GB of RAM available for JVM
 Solr Version 1.3 ; (nightly build of oct 18, 2008)

 MDB - Message Driven Bean
 I am Not using solr's replication mecahnism. Also don't use xml post update 
 since the amount of data is too much.
 I have bundled a MDB that receives messages for data updates and uses solr's 
 update handler to update and commit the index.
 Optimize happens once a day.

 Everything runs fine for 2-3 days; after that I keep getting following 
 exceptions.

 Exception
 org.apache.solr.common.SolrException log java.lang.OutOfMemoryError:
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:350)
at 
 org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
at 
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
at 
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907)
at 
 org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338)
at 
 org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856)
at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283)
at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302)
at 
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
at java.lang.Thread.run(Thread.java:810)




Re: new faceting algorithm

2008-12-02 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Is there a configurable way to switch to the previous implementation? I'd
 like to see exactly how it affects performance in my case.

Thanks for the reminder, I need to document this in the wiki.

facet.method=enum  (enumerate terms and do intersections, the old default)
facet.method=fc  (fieldcache method, the new default)

-Yonik


 Yonik Seeley wrote:

 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:

 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186

 -Yonik



 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
wojtek, you can report back the numbers if possible

It would be nice to know how the new impl performs in real-world

On Tue, Dec 2, 2008 at 11:45 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Is there a configurable way to switch to the previous implementation? I'd
 like to see exactly how it affects performance in my case.

 Thanks for the reminder, I need to document this in the wiki.

 facet.method=enum  (enumerate terms and do intersections, the old default)
 facet.method=fc  (fieldcache method, the new default)

 -Yonik


 Yonik Seeley wrote:

 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:

 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186

 -Yonik



 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
 Sent from the Solr - User mailing list archive at Nabble.com.






-- 
--Noble Paul


RE: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Lance Norskog
Does the DIH delta feature rewrite the delta-import file for each set of rows? 
If it does not, that sounds like a bug/enhancement. 
Lance

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id 
feature

You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will maintain back 
compat and your code will be usable w/ newer versions.


On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Thanks I really apreciate your help.

 I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and 
 modify/add all documents from db wich were inserted after that date. 
 What I want is to be able to save in the field the id of the last 
 idexed doc. So in the next time I ejecute the indexer make it start 
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 When I said:

 be able to save in the field the id of the last idexed doc
 I made a mistake, wanted to mean :

 be able to save in the file (dataimport.properties) the id of the last 
 indexed doc.
 The point would be to do my own deltaquery indexing from the last doc 
 indexed id instead of the timestamp.
 So I think this would not work in that case (it's my mistake because 
 of the bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 It is because I was saying:
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID 
 control

 I am in the correct direction?
  Sorry for my englis and thanks in advance


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese 
 [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am 
 missing three goals. I don't think I can reach them just via xml 
 conf or transformer and sqlEntitProcessor plugin. But need to be 
 sure of that.
 If there's no other way I will hack some solr source classes, would 
 like to know the best way to do that. Once I have it solved, I can 
 upload or post the source in the forum in case someone think it can 
 be helpful.

 1.- Every time I execute dataimporthandler (to index data from a 
 db), at the start time or end time I need to delete some expired 
 documents. I have to delete them from the database and from the 
 index. I know wich documents must be deleted because of a field in 
 the db that says it. Would not like to delete first all from DB or 
 first all from index but one from index and one from doc every time.

 You can override the init() destroy() of the SqlEntityProcessor and 
 use it as the processor for the root entity. At this point you can 
 run the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I 
 could use deltaImport. Don't know If deletedPkQuery is the way to do 
 that. Can not find so much information about how to make it work. As 
 deltaQuery modifies docs (delete old and insert new) I supose it 
 must be a easy way to do this just doing the delete and not the new 
 insert.
 deletedPkQuery does everything first. it runs the query and uses that 
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and 
 modify/add all documents from db wich were inserted after that date. 
 What I want is to be able to save in the field the id of the last 
 idexed doc. So in the next time I ejecute the indexer make it start 
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 The point of doing this is that if I do a full import from a db with 
 lots of rows the app could encounter a problem in the middle of the 
 execution and abort the process. As deltaquey works I would have to 
 restart the execution from the begining. Having this new 
 functionality I could optimize the index and start from the last 
 indexed doc.
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID 
 control

 3.-I commented before about this last point. I want to give boost to 
 doc fields at indexing time.
Adding fieldboost is a planned item.

It must work as follows .
Add a special value $fieldBoost.fieldname to the row map

And DocBuilder should respect that. You can raise a bug and we can 
commit it soon.
 How can I do to rise a bug?
 

RE: Encoded search string qt=Dismax

2008-12-02 Thread Feak, Todd
Do you have a dismaxrequest request handler defined in your solr config xml? 
Or is it dismax?

-Todd Feak

-Original Message-
From: tushar kapoor [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 10:07 AM
To: solr-user@lucene.apache.org
Subject: Encoded search string  qt=Dismax


Hi,

I am facing problems while searching for some encoded text as part of the
search query string. The results don't come up when I use some url encoding
with qt=dismaxrequest.

I am searching a Russian word by posting a URL encoded UTF8 transformation
of the word. The query works fine for normal request. However, no docs are
fetched when qt=dismaxrequest is appended as part of the query string.

The word being searched is -
Russian Word - Предварительное 

UTF8 Java Encoding -
\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435

Posted query string (URL Encoded) - 
%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Following are the two queries and the difference in results

Query 1 - this one works fine

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Result -

?xml version=1.0 encoding=UTF8 ? 
 response
 lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
 lst name=params
  str
name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str
 
  /lst
  /lst
 result name=response numFound=1 start=0
 doc
  str name=Index_Type_sproductIndex/str 
  str name=Index_Type_str_sproductIndex/str 
  str name=URL_s4100018/str 
  str name=URL_str_s4100018/str 
 arr name=all
  strproductIndex/str 
  strproduct/str 
  strПредварительное K математики учебная книга/str 
  str4100018/str 
  str4100018/str 
  str21125/str 
  str91048/str 
  str91047/str 
  /arr
  str name=editionTypeId_s21125/str 
  str name=editionTypeId_str_s21125/str 
 arr name=listOf_taxonomyPath
  str91048/str 
  str91047/str 
  /arr
  str name=prdMainTitle_sПредварительное K математики учебная
книга/str 
  str name=prdMainTitle_str_sПредварительное K математики учебная
книга/str 
  str name=productType_sproduct/str 
  str name=productType_str_sproduct/str 
 arr name=strlistOf_taxonomyPath
  str91048/str 
  str91047/str 
  /arr
  date name=timestamp20081202T08:14:05.63Z/date 
  /doc
  /result
  /response

Query 2 - qt=dismaxrequest - This doesnt work

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435qt=dismaxrequest

Result -
  ?xml version=1.0 encoding=UTF8 ? 
 response
 lst name=responseHeader
  int name=status0/int 
  int name=QTime109/int 
 lst name=params
  str
name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str
 
  str name=qtdismaxrequest/str 
  /lst
  /lst
  result name=response numFound=0 start=0 maxScore=0.0 / 
  /response

Dont know why there is a difference on appending qt=dismaxrequest. Any help
would be appreciated.


Regards,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-02 Thread wojtekpia

Definitely, but it'll take me a few days. I'll also report findings on
SOLR-465. (I've been on holiday for a few weeks)


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 wojtek, you can report back the numbers if possible
 
 It would be nice to know how the new impl performs in real-world
 
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
delta-import file?


On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote:
 Does the DIH delta feature rewrite the delta-import file for each set of 
 rows? If it does not, that sounds like a bug/enhancement.
 Lance

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 02, 2008 8:51 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id 
 feature

 You can write the details to a file using a Transformer itself.

 It is wise to stick to the public API as far as possible. We will maintain 
 back compat and your code will be usable w/ newer versions.


 On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Thanks I really apreciate your help.

 I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I want is to be able to save in the field the id of the last
 idexed doc. So in the next time I ejecute the indexer make it start
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 When I said:

 be able to save in the field the id of the last idexed doc
 I made a mistake, wanted to mean :

 be able to save in the file (dataimport.properties) the id of the last
 indexed doc.
 The point would be to do my own deltaquery indexing from the last doc
 indexed id instead of the timestamp.
 So I think this would not work in that case (it's my mistake because
 of the bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 It is because I was saying:
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID
 control

 I am in the correct direction?
  Sorry for my englis and thanks in advance


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
 [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am
 missing three goals. I don't think I can reach them just via xml
 conf or transformer and sqlEntitProcessor plugin. But need to be
 sure of that.
 If there's no other way I will hack some solr source classes, would
 like to know the best way to do that. Once I have it solved, I can
 upload or post the source in the forum in case someone think it can
 be helpful.

 1.- Every time I execute dataimporthandler (to index data from a
 db), at the start time or end time I need to delete some expired
 documents. I have to delete them from the database and from the
 index. I know wich documents must be deleted because of a field in
 the db that says it. Would not like to delete first all from DB or
 first all from index but one from index and one from doc every time.

 You can override the init() destroy() of the SqlEntityProcessor and
 use it as the processor for the root entity. At this point you can
 run the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I
 could use deltaImport. Don't know If deletedPkQuery is the way to do
 that. Can not find so much information about how to make it work. As
 deltaQuery modifies docs (delete old and insert new) I supose it
 must be a easy way to do this just doing the delete and not the new
 insert.
 deletedPkQuery does everything first. it runs the query and uses that
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I want is to be able to save in the field the id of the last
 idexed doc. So in the next time I ejecute the indexer make it start
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 The point of doing this is that if I do a full import from a db with
 lots of rows the app could encounter a problem in the middle of the
 execution and abort the process. As deltaquey works I would have to
 restart the execution from the begining. Having this new
 functionality I could optimize the index and start from the last
 indexed doc.
 I think I should begin modifying the SolrWriter.java and DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID
 control

 3.-I commented before about this last point. I want to give boost to
 doc fields at indexing time.
Adding fieldboost is a planned item.

It must work as follows .
Add a special value $fieldBoost.fieldname to the row map

And DocBuilder should respect that. You can raise a bug and we can
commit it 

DataImport Hadnler - new bee question

2008-12-02 Thread Jae Joo
Hey,

I am trying to connect the Oracle database and index the values into solr,
but I ma getting the
Document [null] missing required field: id.

Here is the debug output.
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched2/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2008-12-02 13:49:35/str
−
str name=
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
/str

schema.xml
field name=id type=string indexed=true stored=true required=true
/
   field name=subject type=text indexed=true stored=true
omitNorms=true/

 /fields
 uniqueKeyid/uniqueKey


data-config.xml

dataConfig
dataSource  driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@x.x.x.x: user=...  password=.../
document name=companyQAIndex
entity name=companyqa  pk=id query=select * from solr_test 
field column=id name=id /
field column=text name=subject /

/entity
/document
/dataConfig

Database Schema
id  is the pk.
There are only 2 rows in the table solr_test.

Will anyone help me what I am wrong?

Jae


Re: DataImport Hadnler - new bee question

2008-12-02 Thread Jae Joo
I actually found the problem. Oracle returns the field name as Capital.

On Tue, Dec 2, 2008 at 1:57 PM, Jae Joo [EMAIL PROTECTED] wrote:

 Hey,

 I am trying to connect the Oracle database and index the values into solr,
 but I ma getting the
 Document [null] missing required field: id.

 Here is the debug output.
 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched2/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2008-12-02 13:49:35/str
 −
 str name=
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
 /str

 schema.xml
 field name=id type=string indexed=true stored=true required=true
 /
field name=subject type=text indexed=true stored=true
 omitNorms=true/

  /fields
  uniqueKeyid/uniqueKey


 data-config.xml

 dataConfig
 dataSource  driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@x.x.x.x: user=...  password=.../
 document name=companyQAIndex
 entity name=companyqa  pk=id query=select * from solr_test 
 field column=id name=id /
 field column=text name=subject /

 /entity
 /document
 /dataConfig

 Database Schema
 id  is the pk.
 There are only 2 rows in the table solr_test.

 Will anyone help me what I am wrong?

 Jae




Re: Encoded search string qt=Dismax

2008-12-02 Thread Chris Hostetter

First of all...

standard request handler uses the default search field specified in your 
schema.xml -- dismax does not.  dismax looks at the qf param to decide 
which fields to search for the q param.  if you started with the example 
schema the dismax handler may have a default value for qf which is 
trying to query different fields then you actaully use in your documents.

debugQuery=true will show you exactly what query structure (and on which 
fields) each request is using.

Second...

I don't know Russian, and character encoding issues tend to make my head 
spin, but the fact that the responseHeader is echoing back a q param 
containing java string literal sequences suggests that you are doing 
soemthing wrong.  you should be sending the URL encoding of the actaul 
characters, not the URL encoding of the actual Russian word, not the URL 
encoding or the java string literal encoding of the Russian word.  I 
suspect the fact that you are getting any results at all from your first 
query is a fluke.

The str name=q in the responseHeader should show you the real word you 
want to search for -- once it does, then you'll know that you have the 
URL+UTF8 encoding issues straightened out.  *THEN* i would worry about the 
dismax/standard behavior.

:  lst name=params
:   str
: 
name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str
 
:   /lst


-Hoss



Re: OOM on commit after few days

2008-12-02 Thread Sunil
Thanks Yonik

The main search still happens through SolrDispatchFilter so SolrQueryRequest is 
getting closed implicitly.

But I do use direct api in following cases; So pl suggest any more possible 
resource issues 

1. update and commit; 
  core.getUpdateHanlder(); 
  Here I close the updateHandler once update/commits are done

2. Searching in other cores from current core writer
  I have requirement to aggregate the data from multiple indexes and send  
single xml response. 
  otherCore.getSearcher() and call search method to get reference to Hits
  I do call decref() on refCounted once done with processing result  

3. Also call reload core after commit ; This brings down the ram usage but does 
not solve the main issue; With the reload I don't see any leaks but the OOM 
error occurs after 2-3 days time. 

Do you think any other resource not getting closed ? 

Sunil


--- On Tue, 12/2/08, Yonik Seeley [EMAIL PROTECTED] wrote:

 From: Yonik Seeley [EMAIL PROTECTED]
 Subject: Re: OOM on commit after few days
 To: solr-user@lucene.apache.org
 Date: Tuesday, December 2, 2008, 1:13 PM
 Using embedded is always more error prone...you're
 probably forgetting
 to close some resource.
 Make sure to close all SolrQueryRequest objects.
 Start with a memory profiler or heap dump to try and figure
 out what's
 taking up all the memory.
 
 -Yonik
 
 On Tue, Dec 2, 2008 at 1:05 PM, Sunil
 [EMAIL PROTECTED] wrote:
  I have been facing this issue since long in production
 environment and wanted to know if anybody came across can
 share their thoughts.
  Appreciate your help.
 
  Environment
  2 GB index file
  3.5 million documents
  15 mins. time interval for committing 100 to 400
 document updates
Commit happens once in 15 mins.
  3.5 GB of RAM available for JVM
  Solr Version 1.3 ; (nightly build of oct 18, 2008)
 
  MDB - Message Driven Bean
  I am Not using solr's replication mecahnism. Also
 don't use xml post update since the amount of data is
 too much.
  I have bundled a MDB that receives messages for data
 updates and uses solr's update handler to update and
 commit the index.
  Optimize happens once a day.
 
  Everything runs fine for 2-3 days; after that I keep
 getting following exceptions.
 
  Exception
  org.apache.solr.common.SolrException log
 java.lang.OutOfMemoryError:
 at java.io.RandomAccessFile.readBytes(Native
 Method)
 at
 java.io.RandomAccessFile.read(RandomAccessFile.java:350)
 at
 org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
 at
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
 at
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
 at
 org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907)
 at
 org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338)
 at
 org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
 at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
 at
 org.apache.lucene.search.Searcher.search(Searcher.java:126)
 at
 org.apache.lucene.search.Searcher.search(Searcher.java:105)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1302)
 at
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51)
 at
 org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284)
 at
 java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
 at java.lang.Thread.run(Thread.java:810)
 
 


RE: NIO not working yet

2008-12-02 Thread Burton-West, Tom
Thanks Yonik,

-The next nightly build (Dec-01-2008) should have the changes.

The latest nightly build seems to be 30-Nov-2008 08:20,
http://people.apache.org/builds/lucene/solr/nightly/ 
has the version with the NIO fix been built?  Are we looking in the
wrong place?

Tom

Tom Burton-West
Information Retrieval Progammer
Digital Library Production Services
University of Michigan Library

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Sunday, November 30, 2008 8:43 PM
To: solr-user@lucene.apache.org
Subject: Re: NIO not working yet

OK, the development version of Solr should now be fixed (i.e. NIO should
be the default for non-Windows platforms).  The next nightly build
(Dec-01-2008) should have the changes.

-Yonik

On Wed, Nov 12, 2008 at 2:59 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 NIO support in the latest Solr development versions does not work yet 
 (I previously advised that some people with possible lock contention 
 problems try it out).  We'll let you know when it's fixed, but in the 
 meantime you can always set the system property 
 org.apache.lucene.FSDirectory.class to 
 org.apache.lucene.store.NIOFSDirectory to try it out.

 for example:

 java 
 -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDir
 ectory
  -jar start.jar

 -Yonik


Re: NIO not working yet

2008-12-02 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 3:41 PM, Burton-West, Tom [EMAIL PROTECTED] wrote:
 Thanks Yonik,

 -The next nightly build (Dec-01-2008) should have the changes.

 The latest nightly build seems to be 30-Nov-2008 08:20,
 http://people.apache.org/builds/lucene/solr/nightly/
 has the version with the NIO fix been built?  Are we looking in the
 wrong place?

If the tests fail (which they seem to have for the last 2 days) then a
new snapshot is not uploaded.

Hudson also does solr builds though:
http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/dist/

-Yonik


 Tom

 Tom Burton-West
 Information Retrieval Progammer
 Digital Library Production Services
 University of Michigan Library

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Sunday, November 30, 2008 8:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: NIO not working yet

 OK, the development version of Solr should now be fixed (i.e. NIO should
 be the default for non-Windows platforms).  The next nightly build
 (Dec-01-2008) should have the changes.

 -Yonik

 On Wed, Nov 12, 2008 at 2:59 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 NIO support in the latest Solr development versions does not work yet
 (I previously advised that some people with possible lock contention
 problems try it out).  We'll let you know when it's fixed, but in the
 meantime you can always set the system property
 org.apache.lucene.FSDirectory.class to
 org.apache.lucene.store.NIOFSDirectory to try it out.

 for example:

 java
 -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDir
 ectory
  -jar start.jar

 -Yonik



SolrSharp

2008-12-02 Thread Grant Ingersoll

Anyone know the status of SolrSharp?  Is it actively maintained?

Thanks,
Grant


Load balancing for distributed Solr

2008-12-02 Thread Burton-West, Tom
Hello all,

As I understand distributed Solr, a request for a distributed search
goes to a particular Solr instance with a list of arguments specifying
the addresses of the shards to search.  The Solr instance to which the
request is first directed is responsible for distributing the query to
the other shards and pulling together the results.  My questions are:

1 Does it make sense to 
 A.  Always have the same Solr instance responsible for distributing the
query to the other shards
   or 
 B.   Rotate which shard does the distributing/result aggregating?  

2. For scenario A, are there different requirements (memory,cpu,
processors etc) for the machine doing the distribution versus the
machines hosting the shards responding to the distributed requests?

3. For scenario B, are people using some kind of load balancing to
distribute which Solr instance acts as the query distributor/response
aggregator? 

Tom

Tom Burton-West
Information Retrieval Programmer
Digital Library Production Services
University of Michigan


 


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Marc Sturlese

Do you mean the file used by dataimporthandler called dataimport.properties? 
If you mean this one it's writen at the end of the indexing proccess. The
writen date will be used in the next indexation by delta-query to identify
the new or modified rows from the database.

What I am trying to do is instead of saving a timestamp save the last
indexed id. Doing that, in the next execution I will start indexing from the
last doc that was indexed in the previous indexation. But I am still a bit
confused about how to do that...

Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 delta-import file?
 
 
 On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote:
 Does the DIH delta feature rewrite the delta-import file for each set of
 rows? If it does not, that sounds like a bug/enhancement.
 Lance

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 02, 2008 8:51 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed
 id feature

 You can write the details to a file using a Transformer itself.

 It is wise to stick to the public API as far as possible. We will
 maintain back compat and your code will be usable w/ newer versions.


 On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED]
 wrote:

 Thanks I really apreciate your help.

 I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I want is to be able to save in the field the id of the last
 idexed doc. So in the next time I ejecute the indexer make it start
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 When I said:

 be able to save in the field the id of the last idexed doc
 I made a mistake, wanted to mean :

 be able to save in the file (dataimport.properties) the id of the last
 indexed doc.
 The point would be to do my own deltaquery indexing from the last doc
 indexed id instead of the timestamp.
 So I think this would not work in that case (it's my mistake because
 of the bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 It is because I was saying:
 I think I should begin modifying the SolrWriter.java and
 DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID
 control

 I am in the correct direction?
  Sorry for my englis and thanks in advance


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
 [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am
 missing three goals. I don't think I can reach them just via xml
 conf or transformer and sqlEntitProcessor plugin. But need to be
 sure of that.
 If there's no other way I will hack some solr source classes, would
 like to know the best way to do that. Once I have it solved, I can
 upload or post the source in the forum in case someone think it can
 be helpful.

 1.- Every time I execute dataimporthandler (to index data from a
 db), at the start time or end time I need to delete some expired
 documents. I have to delete them from the database and from the
 index. I know wich documents must be deleted because of a field in
 the db that says it. Would not like to delete first all from DB or
 first all from index but one from index and one from doc every time.

 You can override the init() destroy() of the SqlEntityProcessor and
 use it as the processor for the root entity. At this point you can
 run the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I
 could use deltaImport. Don't know If deletedPkQuery is the way to do
 that. Can not find so much information about how to make it work. As
 deltaQuery modifies docs (delete old and insert new) I supose it
 must be a easy way to do this just doing the delete and not the new
 insert.
 deletedPkQuery does everything first. it runs the query and uses that
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I want is to be able to save in the field the id of the last
 idexed doc. So in the next time I ejecute the indexer make it start
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 The point of doing this is that if I do a full import from a db with
 lots of rows the app could encounter a problem in the middle of the
 execution and abort the process. As deltaquey works I would have to
 restart the execution 

Re: Solr 1.3 - response time very long

2008-12-02 Thread sunnyfr

Hi Matthew, Hi Yonik,

...sorry for the flag .. didnt want to ...

Solr 1.3  / Apache 5.5

Data's directory size : 7.9G
I'm using jMeter to hit http request, I'm sending exactly the same on solr
and sphinx(mysql) by http either.

solr
http://test-search.com/test/selector?cache=0backend=solrrequest=/relevance/search/dog
sphinx
http://test-search.com/test/selector?cache=0backend=mysqlrequest=/relevance/search/dog

when threads are more than 4 it's gettting slower, for a big test during
40mn with increasing to 100 threads/sec for solr like for sphinx, at the end
the average for solr is 3sec and for sphinx 1sec.

solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
solrconf.xml 

schema.xml:
 fields
field name=idtype=sintindexed=true
stored=true  omitNorms=true /

field name=duration  type=sintindexed=true
stored=false omitNorms=true /
field name=created   type=dateindexed=true
stored=true omitNorms=true /
field name=modified  type=dateindexed=true
stored=false omitNorms=true /
field name=rating_binratetype=sintindexed=true
stored=true omitNorms=true /
field name=user_id   type=sintindexed=true
stored=false omitNorms=true /
field name=country   type=string  indexed=true
stored=false omitNorms=true /
field name=language  type=string  indexed=true
stored=true omitNorms=true /
   ...
field name=stat_viewstype=sintindexed=true
stored=true omitNorms=true /
field name=stat_views_today  type=sintindexed=true
stored=false omitNorms=true /
field name=stat_views_last_week  type=sintindexed=true
stored=false omitNorms=true /
field name=stat_views_last_month type=sintindexed=true
stored=false omitNorms=true /
field name=stat_comments type=sintindexed=true
stored=false omitNorms=true /
field name=stat_comments_today   type=sintindexed=true
stored=false omitNorms=true /
field name=stat_comments_last_week   type=sintindexed=true
stored=false omitNorms=true /
field name=stat_comments_last_month  type=sintindexed=true
stored=false omitNorms=true /
...
field name=title type=text   indexed=true
stored=true /
field name=title_fr  type=text_frindexed=true
stored=false /
field name=title_en  type=text_enindexed=true
stored=false /
field name=title_de  type=text_deindexed=true
stored=false /
field name=title_es  type=text_esindexed=true
stored=false /
field name=title_ru  type=text_ruindexed=true
stored=false /
field name=title_pt  type=text_ptindexed=true
stored=false /
field name=title_nl  type=text_nlindexed=true
stored=false /
field name=title_el  type=text_elindexed=true
stored=false /
field name=title_ja  type=text_jaindexed=true
stored=false /
field name=title_it  type=text_itindexed=true
stored=false /

field name=description   type=textindexed=true
stored=true /
field name=description_frtype=text_frindexed=true
stored=false /
field name=description_entype=text_enindexed=true
stored=false /
field name=description_detype=text_deindexed=true
stored=false /
field name=description_estype=text_esindexed=true
stored=false /
field name=description_rutype=text_ruindexed=true
stored=false /
field name=description_pttype=text_ptindexed=true
stored=false /
field name=description_nltype=text_nlindexed=true
stored=false /
field name=description_eltype=text_elindexed=true
stored=false /
field name=description_jatype=text_jaindexed=true
stored=false /


   field name=text type=text indexed=true stored=false
multiValued=true/
   field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/
   field name=spell type=textSpell indexed=true stored=true
multiValued=true/
   dynamicField name=random* type=random /
 /fields

What would you reckon ???
Thanks a lot,




Matthew Runo wrote:
 
 Could you provide more information? How big is the index? How are you  
 searching it? Some examples might help pin down the issue.
 
 How long are the queries taking? How long did they take on Sphinx?
 
 Thanks for your time!
 
 Matthew Runo
 Software Engineer, Zappos.com
 [EMAIL PROTECTED] - 702-943-7833
 
 On Dec 2, 2008, at 9:04 AM, sunnyfr wrote:
 

 Hi,

 I tested my old search engine which is sphinx and my new one which  
 solr and
 I've got a uge difference of result.
 How can I make it faster?

 Thanks a lot,
 -- 
 View this message in context:
 

Re: SolrSharp

2008-12-02 Thread Otis Gospodnetic
I don't think it is.  There is another C# client up on Google Code, but I'm not 
sure how well that one is maintained...


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Grant Ingersoll [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, December 2, 2008 4:02:33 PM
 Subject: SolrSharp
 
 Anyone know the status of SolrSharp?  Is it actively maintained?
 
 Thanks,
 Grant



Query ID range? possible?

2008-12-02 Thread cstadler18
We are using Solr and would like to know is there a query syntax to retrieve 
the newest x records? in decending order.?
Our id field is simply that (unique id record identifier)  so ideally we would 
want to get the last say 100 records added.

Possible?

Also is there a special way it needs to be defined in the schema?
 uniqueKeyid/uniqueKey
 field name=id type=text indexed=true  stored=true   
required=true  omitNorms=false /

In addition, what if we want the last 100 records added (order by id desc) and 
another field.. say media type A for example
field name=media  type=string   indexed=true  stored=true   
omitNorms=true required=false/

Thanks.
-Craig


Trying to exclude integer field with certain numbers

2008-12-02 Thread Jake Conk
Hello,

I am trying to exclude certain records from my search results in my
query by specifying which ones I don't want back but its not working
as expected. Here is my query:

+message:test AND (-thread_id:123 OR -thread_id:456 OR -thread_id:789)

So basically I just want anything back that has the word test
anywhere in the message text field and does not contain the thread id
123, 456, or 789.

When I execute that query I get no results back. When I just execute
+message:test then I get results back and some of them with the thread
ids I listed above but when I try to exclude them like that it doesn't
work.

Anyone have any idea how do I fix this?

Thanks,

- Jake C.


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
OK . I guess I see it.  I am thinking of exposing the writes to the
properties file via an API.

say Context#persist(key,value);


This can write the data to the dataimport.properties.

You must be able to retrieve that value by ${dataimport.persist.key}

or through an API, Context.getPersistValue(key)

You can raise an issue and give a patch and we can get it committed

I guess this is what you wish to achieve

--Noble



On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Do you mean the file used by dataimporthandler called dataimport.properties?
 If you mean this one it's writen at the end of the indexing proccess. The
 writen date will be used in the next indexation by delta-query to identify
 the new or modified rows from the database.

 What I am trying to do is instead of saving a timestamp save the last
 indexed id. Doing that, in the next execution I will start indexing from the
 last doc that was indexed in the previous indexation. But I am still a bit
 confused about how to do that...

 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 delta-import file?


 On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote:
 Does the DIH delta feature rewrite the delta-import file for each set of
 rows? If it does not, that sounds like a bug/enhancement.
 Lance

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 02, 2008 8:51 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed
 id feature

 You can write the details to a file using a Transformer itself.

 It is wise to stick to the public API as far as possible. We will
 maintain back compat and your code will be usable w/ newer versions.


 On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED]
 wrote:

 Thanks I really apreciate your help.

 I didn't explain myself so well in here:

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I want is to be able to save in the field the id of the last
 idexed doc. So in the next time I ejecute the indexer make it start
 indexing from that last indexed id doc.
 You can use a Transformer to write something to the DB.
 Context#getDataSource(String) for each row

 When I said:

 be able to save in the field the id of the last idexed doc
 I made a mistake, wanted to mean :

 be able to save in the file (dataimport.properties) the id of the last
 indexed doc.
 The point would be to do my own deltaquery indexing from the last doc
 indexed id instead of the timestamp.
 So I think this would not work in that case (it's my mistake because
 of the bad explanation):

You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

 It is because I was saying:
 I think I should begin modifying the SolrWriter.java and
 DocBuilder.java.
 Creating functions like getStartTime, persistStartTime... for ID
 control

 I am in the correct direction?
  Sorry for my englis and thanks in advance


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
 [EMAIL PROTECTED]
 wrote:

 Hey there,

 I have my dataimporthanlder almost completely configured. I am
 missing three goals. I don't think I can reach them just via xml
 conf or transformer and sqlEntitProcessor plugin. But need to be
 sure of that.
 If there's no other way I will hack some solr source classes, would
 like to know the best way to do that. Once I have it solved, I can
 upload or post the source in the forum in case someone think it can
 be helpful.

 1.- Every time I execute dataimporthandler (to index data from a
 db), at the start time or end time I need to delete some expired
 documents. I have to delete them from the database and from the
 index. I know wich documents must be deleted because of a field in
 the db that says it. Would not like to delete first all from DB or
 first all from index but one from index and one from doc every time.

 You can override the init() destroy() of the SqlEntityProcessor and
 use it as the processor for the root entity. At this point you can
 run the necessary db queries and solr delete queries . look at
 Context#getSolrCore() and Context#getdataSource(String)


 The delete mark is setted as an update in the db row so I think I
 could use deltaImport. Don't know If deletedPkQuery is the way to do
 that. Can not find so much information about how to make it work. As
 deltaQuery modifies docs (delete old and insert new) I supose it
 must be a easy way to do this just doing the delete and not the new
 insert.
 deletedPkQuery does everything first. it runs the query and uses that
 to identify the deleted rows.

 2.-This is probably my most difficult goal.
 Deltaimport reads a timestamp from the dataimport.properties and
 modify/add all documents from db wich were inserted after that date.
 What I 

Re: Load balancing for distributed Solr

2008-12-02 Thread Jeremy Hinegardner
On Tue, Dec 02, 2008 at 04:34:08PM -0500, Burton-West, Tom wrote:
 Hello all,
 
 As I understand distributed Solr, a request for a distributed search
 goes to a particular Solr instance with a list of arguments specifying
 the addresses of the shards to search.  The Solr instance to which the
 request is first directed is responsible for distributing the query to
 the other shards and pulling together the results.  My questions are:
 
 1 Does it make sense to 
  A.  Always have the same Solr instance responsible for distributing the
 query to the other shards
or 
  B.   Rotate which shard does the distributing/result aggregating?  
 
 2. For scenario A, are there different requirements (memory,cpu,
 processors etc) for the machine doing the distribution versus the
 machines hosting the shards responding to the distributed requests?
 
 3. For scenario B, are people using some kind of load balancing to
 distribute which Solr instance acts as the query distributor/response
 aggregator? 

We use Scenario B, we have 4 Solr instances (4 machines), each with N data
SolrCores and 1 'default' Core which does the dispatch and aggregation of
requests between the 4*N total data cores.  We then use HAproxy to load balance
the requests between the dispatch Cores.

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED]