Re: Solr Clustering

2012-09-17 Thread Denis Kuzmenok



Sorry for late response. To be strict, here is what i want:

* I get documents all the time. Let's assume those are news (It's
rather similar thing).

* Every time i get new batch of news i should add them to Solr index
and get cluster information for that document. Store this information
in the DB (so i should know each document's cluster).

* I can't wait for cluster definition service/program to launch from
time to time, but it should define clusters on the fly.

* I want to be able to get clusters only for some period of time (For
example i want to search for clusters only for documents that were
loader one month ago).

* I will have tens of thousands of new documents every day and overall
base of several millions.

I'm reading Mahout in action now. But maybe you can point me to what i
need.
--- Исходное сообщение ---
От кого: Chandan Tamrakar chandan.tamra...@nepasoft.com
Кому: solr-user@lucene.apache.org
Дата: 4 сентября 2012, 12:30:56
Тема: Re: Solr Clustering





yes there is a solr component if you want to cluster solr documents , check
the following linkhttp://wiki.apache.org/solr/ClusteringComponent
Carrot2 might be good if you want to cluster few thousands of documents ,
for example when user search solr , just cluster the  search results

Mahout is much more scalable and probably you need Hadoop for that


thanks
chandan

On Tue, Sep 4, 2012 at 2:10 PM, Denis Kuzmenok forward...@ukr.net wrote:



  Original Message 
 Subject: Solr Clustering
 From: Denis Kuzmenok forward...@ukr.net
 To: solr-user@lucene.apache.org CC:

 Hi, all.
 I know there is carrot2 and mahout for clustering. I want to implement
 such thing:
 I fetch documents and want to group them into clusters when they are added
 to index (i want to filter similar documents for example for 1 week). i
 need these documents quickly, so i cant rely on some postponed
 calculations. Each document should have assigned cluster id (like group
 similar documents into clusters and assign each document its cluster id.
 It's something similar to news aggregators like google news. I dont need
 to search for clusters with documents older than 1 week (for example). Each
 document will have its unique id and saved into DB. But solr will have
 cluster id field also.
 Is it possible to implement this with solr/carrot/mahout?




-- 
Chandan Tamrakar
*
*



Solr Clustering

2012-09-04 Thread Denis Kuzmenok


 Original Message 
Subject: Solr Clustering
From: Denis Kuzmenok forward...@ukr.net
To: solr-user@lucene.apache.org
CC: 

Hi, all.
I know there is carrot2 and mahout for clustering. I want to implement such 
thing:
I fetch documents and want to group them into clusters when they are added to 
index (i want to filter similar documents for example for 1 week). i need 
these documents quickly, so i cant rely on some postponed calculations. Each 
document should have assigned cluster id (like group similar documents into 
clusters and assign each document its cluster id.
It's something similar to news aggregators like google news. I dont need to 
search for clusters with documents older than 1 week (for example). Each 
document will have its unique id and saved into DB. But solr will have cluster 
id field also.
Is it possible to implement this with solr/carrot/mahout?

Solr Clustering

2012-09-04 Thread Denis Kuzmenok
Hi, all. I know there is carrot2 and mahout for clustering. I want to implement 
such thing: I fetch documents and want to group them into clusters when they 
are added to index (i want to filter similar documents for example for 1 
week). i need these documents quickly, so i cant rely on some postponed 
calculations. Each document should have assigned cluster id (like group similar 
documents into clusters and assign each document its cluster id. It's something 
similar to news aggregators like google news. I dont need to search for 
clusters with documents older than 1 week (for example). Each document will 
have its unique id and saved into DB. But solr will have cluster id field also. 
Is it possible to implement this with solr/carrot/mahout?

Solr Clustering

2012-09-04 Thread Denis Kuzmenok
Hi, all.
I know there is carrot2 and mahout for clustering. I want to implement such 
thing:
I fetch documents and want to group them into clusters when they are added to 
index (i want to filter similar documents for example for 1 week). i need 
these documents quickly, so i cant rely on some postponed calculations. Each 
document should have assigned cluster id (like group similar documents into 
clusters and assign each document its cluster id.
It's something similar to news aggregators like google news. I dont need to 
search for clusters with documents older than 1 week (for example). Each 
document will have its unique id and saved into DB. But solr will have cluster 
id field also.
Is it possible to implement this with solr/carrot/mahout?

Field grouping?

2011-08-31 Thread Denis Kuzmenok
Hi.

Suppose  i  have  a field price with different values, and i want to
get  ranges for this field depending on docs count, for example i want
to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
200 docs = 34 docs in each field, etc.

Is it possible with solr?



Re: Field grouping?

2011-08-31 Thread Denis Kuzmenok
But  i  don't  know what values would be price field in that query. It
can  be 100-1000, and 10-100, and i want to get ranges in every query,
just split price field by docs number.

 Yes, Ranged Facets
 http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

 2011/8/31 Denis Kuzmenok forward...@ukr.net

 Hi.

 Suppose  i  have  a field price with different values, and i want to
 get  ranges for this field depending on docs count, for example i want
 to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
 200 docs = 34 docs in each field, etc.

 Is it possible with solr?







Re: indexing but not able to search

2011-07-06 Thread Denis Kuzmenok
 Hi All

 I indexed a set of documents using Solr, which are shown in the stats page
 on the admin panel.
 However, the search interface always returns 0 documents to me.
 When I give the query as *:*, it does return me all the 20K odd documents I
 tried indexing just a few hours back.

 Can someone tell me if there is anything I am missing, on the querying
 config part?

 Sowmya.

Show your solrconfig.xml, and url you are querying to select results



Strange behavior

2011-06-14 Thread Denis Kuzmenok
Hi.

I've  debugged search on test machine, after copying to production server
the  entire  directory  (entire solr directory), i've noticed that one
query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
production.
How can that be?



Re: Strange behavior

2011-06-14 Thread Denis Kuzmenok
What  should  i provide, OS is the same, environment is the same, solr
is  completely  copied,  searches  work,  except that one, and that is
strange.. 

 I think you will need to provide more information than this, no-one on this 
 list is omniscient AFAIK.

 François

 On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.
 
 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?
 






Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Hi, everyone.

I have fields:
text fields: name, title, text
boolean field: isflag (true / false)
int field: popularity (0 to 9)

Now i do query:
defType=edismax
start=0
rows=20
fl=id,name
q=lg optimus
fq=
qf=name^3 title text^0.3
sort=score desc
pf=name
bf=isflag sqrt(popularity)
mm=100%
debugQuery=on


If i do query like Samsung i want to see prior most relevant results
with  isflag:true and bigger popularity, but if i do query like Nokia
6500  and  there is isflag:false, then it should be higher because of
exact  match.  Tried different combinations, but didn't found one that
suites   me.   Just   got   isflag/popularity   sorting   working   or
isflag/relevancy sorting.



Re: Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Your  solution  seems  to work fine, not perfect, but much better then
mine :)
Thanks!

 If i do query like Samsung i want to see prior most relevant results
 with  isflag:true and bigger popularity, but if i do query like Nokia
 6500  and  there is isflag:false, then it should be higher because of
 exact  match.  Tried different combinations, but didn't found one that
 suites   me.   Just   got   isflag/popularity   sorting   working   or
 isflag/relevancy sorting.

 Multiplicative boosts tend to be more stable...

 Perhaps try replacing
   bf=isflag sqrt(popularity)
 with
   bq=isflag:true^10  // vary the boost to change how much
 isflag counts vs the relevancy score of the main query
   boost=sqrt(popularity)  // this will multiply the result by
 sqrt(popularity)... assumes that every document has a non-zero
 popularity

 You could get more creative in trunk where booleans have better
 support in function queries.

 -Yonik
 http://www.lucidimagination.com






Re: Boosting result on query.

2011-06-08 Thread Denis Kuzmenok
 If you could move to 3.x and your linked item boosts could be
 calculated offline in batch periodically you could use an external
 file field to store the doc boost.

 a few If's though

I  have  3.2 and external file field doesn't work without solr restart
(on multicore instance). 



Re: Problem with boosting function

2011-06-08 Thread Denis Kuzmenok
Show your full request to solr (all params)

 Hi,
 I'm trying to use bf parameter in solr queries but I'm having some problems.

 The context is: I have some topics and a integer weight of popularity
 (number of users that follow the topic). I'd like to boost the documents
 according to this weight field, and it changes (users may start following or
 unfollowing that topic). I through the best way to do that is adding a bf
 parameter to the query.

 First of all I was trying to include it in a query processed by a default
 SearchHandler. I debugged the results and the scores didn't change. So I
 tried to change the defType of the SearchHandler to dismax (I didn't add any
 other field in solrconfig), and queries didn't work anymore.

 What is the best way to achieve what I want? Do I really need to use a
 dismax SearchHander (I read about it, and I don't want to search in multple
 fields - I want to search in one field and boost in another one)?

 Thanks in advance

 Alex Grilo




Re: Problem with boosting function

2011-06-08 Thread Denis Kuzmenok
try:

q=title:UnicampdefType=dismaxbf=question_count^5.0
title:Unicamp in any search handler will search only in requested field

 The queries I am trying to do are
 q=title:Unicamp

 and

 q=title:Unicampbf=question_count^5.0

 The boosting factor (5.0) is just to verify if it was really used.

 Thanks

 Alex





Re: Documents update

2011-06-07 Thread Denis Kuzmenok
Created  file,  reloaded  solr  -  externalfilefield  works fine, if i
change  change  external  files  and  do curl
http://127.0.0.1:4900/solr/site/update -H Content-Type: text/xml 
--data-binary 'commit /'
then  no  thanges are made. If i start solr without external files and
then create them - they are not working..
What is wrong?

PS: Solr 3.2

 http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

 On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote:
 Flags   are   stored  to filter results and it's pretty highloaded, it's
 working  fine,  but i can't update index very often just to make flags
 up to time =\
 Where can i read about using external fields / files?
 
  And it wouldn't work unless all the data is stored anyway. Currently
  there's no way to update a single field in a document, although there's
  work being done in that direction (see the column stride JIRA).
  
  What do you want to do with these fields? If it's to influence scoring,
  you could look at external fields.
  
  If the flags are a selection criteria, it's...harder. What are the flags
  used for? Could you consider essentially storing a map of the
  uniqueKey's and flags in a special document and having your app
  read that document and merge the results with the output? If this seems
  irrelevant, a more complete statement of the use-case would be helpful.
  
  Best
  Erick





Need query help

2011-06-06 Thread Denis Kuzmenok
For now i have a collection with:
id (int)
price (double) multivalue
brand_id (int)
filters (string) multivalue

I  need  to  get available brand_id, filters, price values and list of
id's   for   current   query.  For  example now i'm doing queries with
facet.field=brand_id/filters/price:
1) to get current id's list: (brand_id:100 OR brand_id:150) AND (filters:p1s100 
OR filters:p4s20)
2) to get available filters on selected properties (same properties but
another  values):  (brand_id:100 OR brand_id:150) AND (filters:p1s* OR
filters:p4s*)
3) to get available brand_id (if any are selected, if none - take from
1st query results): (filters:p1s100 OR filters:p4s20)
4) another request to get available prices if any are selected

Is there any way to simplify this task?
Data needed:
1) Id's for selected filters, price, brand_id
2) Available filters, price, brand_id from selected values
3) Another values for selected properties (is any chosen)
4) Another brand_id for selected brand_id
5) Another price for selected price

Will appreciate any help or thoughts!

Cheers,
Denis Kuzmenok



Re: Solr memory consumption

2011-06-02 Thread Denis Kuzmenok
 Hey Denis,

 * How big is your index in terms of number of documents and index size?
5  cores,  average  250.000  documents,  one with about 1 million (but
without  text,  just  int/float  fields),  one  with  about 10 million
id/name documents, but with n-gram.
Size: 4 databases about 1G (sum), 1 database (with n-gram) for 21G..
I  don't  know any other way to search for product names except n-gram
=\

 * Is it production system where you have many search requests?
Yes, dependent on database, but not less than 100 req/sec

 * Is there any pattern for OOM errors? I.e. right after you start your
 Solr app, after some search activity or specific Solr queries, etc?
No, java just raises memory size used all the time until it crush.

 * What are 1) cache settings 2) facets and sort-by fields 3) commit
 frequency and warmup queries?
All settings are default (as given in trunk / example).
Facets are used, sort by also used.
Commits  are  divided  into  2  groups:
- often but small (last changed
info)
- 1 time per day all the database

 etc

 Generally you might want to connect to your jvm using jconsole tool
 and monitor your heap usage (and other JVM/Solr numbers)

 * http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
 * http://wiki.apache.org/solr/SolrJmx#Remote_Connection_to_Solr_JMX

 HTH,
 Alexey





Need Schema help

2011-06-02 Thread Denis Kuzmenok
Hi)

What i need:
Index  prices  to  products, each product has multiple prices, to each
region, country, and price itself.
I   tried   to  do  with  field  type  long  multiple:true, and form
value  as  country  code  +  region code + price (1004000349601, for
example), but it has strange behaviour.. price:[* TO 1004000349600] do
include 1004000349601.. I am doing something wrong?

Possible data:
Country: 1-9
Region: 0-99
Price: 1-999



Re: Need Schema help

2011-06-02 Thread Denis Kuzmenok
Thursday, June 2, 2011, 6:29:23 PM, you wrote:
Wow. This sounds nice. Will try this way. Thanks!

 Denis,

 would dynamic fields help:
 field defined as *_price in schema

 at index time you index fields named like:
 [1-9]_[0-99]_price

 at query time you search the price field for a given country region
 1_10_price:[10 TO 100]

 This may work for some use-cases i guess

 lee







Re: Solr memory consumption

2011-06-01 Thread Denis Kuzmenok
My  OS  is  also CentOS (5.4). If it were 10gb all the time it would be
ok, but it grows for 13-15gb, and hurts other services =\


 It could be environment specific (specific of your top command
 implementation, OS, etc)

 I have on CentOS 2986m virtual memory showing although -Xmx2g

 You have 10g virtual although -Xmx6g 

 Don't trust it too much... top command may count OS buffers for opened
 files, network sockets, JVM DLLs itself, etc (which is outside Java GC
 responsibility); additionally to JVM memory... it counts all memory, not
 sure... if you don't have big values for 99.9%wa (which means WAIT I/O -
 disk swap usage) everyhing is fine...



 -Original Message-
 From: Denis Kuzmenok 
 Sent: May-31-11 4:18 PM
 To: solr-user@lucene.apache.org
 Subject: Solr memory consumption

 I  run  multiple-core  solr with flags: -Xms3g -Xmx6g -D64, but i see this
 in top after 6-8 hours and still raising:

 17485  test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java
 -Xms3g -Xmx6g -D64
 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar
 start.jar
   
 Are there any ways to limit memory for sure?

 Thanks







Re: Solr memory consumption

2011-06-01 Thread Denis Kuzmenok
Here  is output after about 24 hours running solr. Maybe there is some
way to limit memory consumption? :(


test@d6 ~/solr/example $ java -Xms3g-Xmx6g-D64
-Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar
2011-05-31 17:05:14.265:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-05-31 17:05:14.355:INFO::jetty-6.1-SNAPSHOT
2011-05-31 17:05:16.447:INFO::Started SocketConnector@0.0.0.0:4900
#
# A fatal error has been detected by the Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 32744 bytes for ChunkPool::allocate. 
Out of swap space?
#
#  Internal Error (allocation.cpp:117), pid=17485, tid=1090320704
#  Error: ChunkPool::allocate
#
# JRE version: 6.0_17-b17
# Java VM: OpenJDK 64-Bit Server VM (14.0-b16 mixed mode linux-amd64 )
# Derivative: IcedTea6 1.7.5
# Distribution: Custom build (Wed Oct 13 13:04:40 EDT 2010)
# An error report file with more information is saved as:
# /mnt/data/solr/example/hs_err_pid17485.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#
Aborted


 I  run  multiple-core  solr with flags: -Xms3g -Xmx6g -D64, but i see
 this in top after 6-8 hours and still raising:

 17485  test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java
 -Xms3g -Xmx6g -D64
 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar
   
 Are there any ways to limit memory for sure?

 Thanks






Re: Solr memory consumption

2011-06-01 Thread Denis Kuzmenok
So what should i do to evoid that error?
I can use 10G on server, now i try to run with flags:
java -Xms6G -Xmx6G -XX:MaxPermSize=1G -XX:PermSize=512M -D64

Or should i set xmx to lower numbers and what about other params?
Sorry, i don't know much about java/jvm =(



Wednesday, June 1, 2011, 7:29:50 PM, you wrote:

 Are you in fact out of swap space, as the java error suggested?

 The way JVM's work always, if you tell it -Xmx6g, it WILL use all 6g 
 eventually.  The JVM doesn't Garbage Collect until it's going to run out
 of heap space, until it gets to your Xmx.  It will keep using RAM until
 it reaches your Xmx.

 If your Xmx is set so high you don't have enough RAM available, that 
 will be a problem, you don't want to set Xmx like this. Ideally you 
 don't even want to swap, but normally the OS will swap to give you 
 enough RAM if neccesary -- if you don't have swap space for it to do 
 that, to give the JVM the 6g you've configured it to take well, that
 seems to be what the Java error message is telling you. Of course 
 sometimes error messages are misleading.

 But yes, if you set Xmx to 6G, the process WILL use all 6G eventually.
 This is just how the JVM works.




Re: Solr memory consumption

2011-06-01 Thread Denis Kuzmenok
Overall  memory on server is 24G, and 24G of swap, mostly all the time
swap  is  free and is not used at all, that's why no free swap sound
strange to me..


 There is no simple answer.

 All I can say is you don't usually want to use an Xmx that's more than
 you actually have available RAM, and _can't_ use more than you have 
 available ram+swap, and the Java error seems to be suggesting you are 
 using more than is available in ram+swap. That may not be what's going
 on, JVM memory issues are indeed confusing.

 Why don't you start smaller, and see what happens.  But if you end up 
 needing more RAM for your Solr than you have available on the server, 
 then you're just going to need more RAM.

 You may have to learn something about java/jvm to do memory tuning for
 Solr. Or, just start with the default parameters from the Solr example
 jetty, and if you don't run into any problems, then great.  Starting 
 with the example jetty shipped with Solr would be the easiest way to get
 started for someone who doesn't know much about Java/JVM.

 On 6/1/2011 12:37 PM, Denis Kuzmenok wrote:
 So what should i do to evoid that error?
 I can use 10G on server, now i try to run with flags:
 java -Xms6G -Xmx6G -XX:MaxPermSize=1G -XX:PermSize=512M -D64

 Or should i set xmx to lower numbers and what about other params?
 Sorry, i don't know much about java/jvm =(



 Wednesday, June 1, 2011, 7:29:50 PM, you wrote:

 Are you in fact out of swap space, as the java error suggested?
 The way JVM's work always, if you tell it -Xmx6g, it WILL use all 6g
 eventually.  The JVM doesn't Garbage Collect until it's going to run out
 of heap space, until it gets to your Xmx.  It will keep using RAM until
 it reaches your Xmx.
 If your Xmx is set so high you don't have enough RAM available, that
 will be a problem, you don't want to set Xmx like this. Ideally you
 don't even want to swap, but normally the OS will swap to give you
 enough RAM if neccesary -- if you don't have swap space for it to do
 that, to give the JVM the 6g you've configured it to take well, that
 seems to be what the Java error message is telling you. Of course
 sometimes error messages are misleading.
 But yes, if you set Xmx to 6G, the process WILL use all 6G eventually.
 This is just how the JVM works.







Re: Solr memory consumption

2011-06-01 Thread Denis Kuzmenok
There  were  no  parameters  at  all,  and java hitted out of memory
almost  every day, then i tried to add parameters but nothing changed.
Xms/Xmx  -  did  not solve the problem too. Now i try the MaxPermSize,
because it's the last thing i didn't try yet :(


Wednesday, June 1, 2011, 9:00:56 PM, you wrote:

 Could be related to your crazy high MaxPermSize like Marcus said.

 I'm no JVM tuning expert either. Few people are, it's confusing. So if
 you don't understand it either, why are you trying to throw in very 
 non-standard parameters you don't understand?  Just start with whatever
 the Solr example jetty has, and only change things if you have a reason
 to (that you understand).

 On 6/1/2011 1:19 PM, Denis Kuzmenok wrote:
 Overall  memory on server is 24G, and 24G of swap, mostly all the time
 swap  is  free and is not used at all, that's why no free swap sound
 strange to me..






Re: Documents update

2011-05-31 Thread Denis Kuzmenok
Flags   are   stored  to filter results and it's pretty highloaded, it's
working  fine,  but i can't update index very often just to make flags
up to time =\
Where can i read about using external fields / files?


 And it wouldn't work unless all the data is stored anyway. Currently there's
 no way to update a single field in a document, although there's work being
 done in that direction (see the column stride JIRA).

 What do you want to do with these fields? If it's to influence scoring, you
 could look at external fields.

 If the flags are a selection criteria, it's...harder. What are the flags
 used for? Could you consider essentially storing a map of the
 uniqueKey's and flags in a special document and having your app
 read that document and merge the results with the output? If this seems
 irrelevant, a more complete statement of the use-case would be helpful.

 Best
 Erick







Re: Documents update

2011-05-31 Thread Denis Kuzmenok
Will it be slow if there are 3-5 million key/value rows?

 http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

 On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote:
 Flags   are   stored  to filter results and it's pretty highloaded, it's
 working  fine,  but i can't update index very often just to make flags
 up to time =\
 Where can i read about using external fields / files?






Solr memory consumption

2011-05-31 Thread Denis Kuzmenok
I  run  multiple-core  solr with flags: -Xms3g -Xmx6g -D64, but i see
this in top after 6-8 hours and still raising:

17485  test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java
-Xms3g -Xmx6g -D64 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar 
start.jar
  
Are there any ways to limit memory for sure?

Thanks



n-gram speed

2011-05-30 Thread Denis Kuzmenok
I have a database with n-gram field, about 5 millions documents. QTime
is  about 200-1000 ms, database is not optimized because it must reply
to  queries  everytime  and  data  are updated often. Is it normal?
Solr: 3.1, java -Xms2048M -Xmx4096M
Server: i7, 12Gb




Solr 3.1 commit errors

2011-05-30 Thread Denis Kuzmenok
After restart i have these errors every time i do commit via post.jar.

Config: multicore / 5 cores, Solr 3.1

Lock obtain timed out: 
SimpleFSLock@/home/ava/solr/example/multicore/context/data/index/write.lock  
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
SimpleFSLock@/home/ava/solr/example/multicore/context/data/index/write.lock  at 
org.apache.lucene.store.Lock.obtain(Lock.java:84)  at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)  at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)  at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)  at 
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)  at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)  
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)  
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)  at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)  at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)  at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)  
at org.mortbay.j

Tried to google a little bit but without any luck..



Re: Documents update

2011-05-27 Thread Denis Kuzmenok
I'm  using  3.1  now.  Indexing  lasts for a few hours, and have big
plain size. Getting all documents would be rather slow :(


 Not with 1.4, but apparently there is a patch for trunk. Not
 sure if it is in 3.1.

 If you are on 1.4, you could first query Solr to get the data
 for the document to be changed, change the modified values,
 and make a complete XML, including all fields, for post.jar.

 Regards,
 Gora






XML Update overwrite?

2011-05-13 Thread Denis Kuzmenok
Hi.

I  try  to  understand  the meaning of overwrite=false in xml that i
post with post.jar.

I have two possible behaviour:
1) if the document  with  specified  uniquekey exists - it's not updated
(even if some fields are changed)
2)   if  the document with  specified  uniquekey exists and all fields
are the same - it's not updated and is skipped

What is the correct way overwrite=false works?

Thanks



Solr sorting

2011-03-14 Thread Denis Kuzmenok
Hi.
Is there any way to make such scheme working:
I  have  many  documents,  each  has  a  random field to enable random
sorting,  and  i  have  a weight field.
I  want to get random results, but documents with bigger weight should
appear more frequently.

Is that possible?

Thanks, in advance.



Re: Solr sorting

2011-03-14 Thread Denis Kuzmenok

 --- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote:

 From: Denis Kuzmenok forward...@ukr.net
 Subject: Solr sorting
 To: solr-user@lucene.apache.org
 Date: Monday, March 14, 2011, 10:23 AM
 Hi.
 Is there any way to make such scheme working:
 I  have  many  documents,  each 
 has  a  random field to enable random
 sorting,  and  i  have  a weight
 field.
 I  want to get random results, but documents with
 bigger weight should
 appear more frequently.


 You can use http://wiki.apache.org/solr/FunctionQuery
 An example can be : sort=product(random_123,weight)
 But this requires solr3.1 or trunk.
 With 1.4.1 you can use _val_ hook :
 q=_val_:product(random_78,weight)

Tried  this.  Tried  to make query with one category on documents. Top
documents  have  weight  about 1700, 1000, 800, 600. And therefore top
document  always  hits  in  top  position  :(  There  should  be  some
limitations or something for the weight field values, i guess..






Solr context search

2010-11-17 Thread Denis Kuzmenok
  Hi.
I  wonder  is  it  possible  in built-in way to make context search in
Solr?

I  have about 50k documents (mainly 'name' of char(150)), so i receive
a content of a page and should show found documents.

Of  course  i can just join by OR and submit a search, but an accuracy
would be not so good..

Thanks in advance.