Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Ravi Gidwani
Hi Shalin:

   I get your point about not knowing what has been deleted from the 
 database. So this is what even I am looking for:

 0) A document (id=100) is currently part of solr index.(
 1) Lets say the application deleted a record with id=100 from database.

 2) Now I need to execute some DIH command to say remove document where 
 id=100. I dont expect the DIH to automatically detect what has been deleted,
 but I am looking for a DIH command/special-command to request deletion from 
 index.

 Is that possible ? also as an alternate solution, is it possible to build 
 index using DIH, and use the solr.XmlUpdateRequestHandler request handler to 
 delete/update these one off documents ?
 Is this something you will recommend ?

 Thanks,
 ~Ravi Gidwani.

 On Tue, Dec 29, 2009 at 3:03 AM, Mohamed Parvez par...@gmail.com wrote:

  I have looked in the that thread earlier. But there is no option there for

  a
  solution from Solr side.
 
  I mean the two more options there are
  1] Use database triggers instead of DIH to manage updating the index :-
  This out of question as we cant run 1000 odd triggers every hour to delete.

 
  2] Some sort of ORM use its interception:-
  This is also out of question as the deletes happens form external system or
  directly on the database, not through our application.
 
 

  To Say in Short, Solr Should have something thing to keep the index synced
  with the database. As of now its one way street, updates rows, on DB will
  go
  to the index. Deleted rows in the DB, will not be deleted from the Index

 
 
 How can Solr figure out what has been deleted? Should it go through each row
 and comparing against each doc? Even then some things are not possible
 (think indexed fields). It would be far efficient to just do a full-import

 each time instead.

 --
 Regards,
 Shalin Shekhar Mangar.




Re: fl parameter and dynamic fields

2009-12-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you wish to search on fields using wild-card you have to use a
copyField to copy all the values of Bool_* to another field and
search on that field.


On Tue, Dec 29, 2009 at 4:14 AM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] timothy.j.har...@nasa.gov wrote:
 I use dynamic fields heavily in my SOLR config.  I would like to be able to 
 specify which fields should be returned from a query based on a pattern for 
 the field name.  For instance, given:

            dynamicField name=Bool_* type=boolean
                  indexed=true stored=true /

 I might be able to construct a query like:
 http://localhost:8080/solr/select?q=Bool_*:truerows=10

 Is there something like this in SOLR?

 Thanks,
 Tim Harsch





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


RE: Solr an Greek Chars

2009-12-29 Thread ZAROGKIKAS,GIORGOS
Ok
My configuration is correct 
I found the problem 

Curl had problems with Greek chars 
So I developed a application an passed my data with Http post 
And it’s ok 

Thanks

-Original Message-
From: Markus Jelsma [mailto:mar...@buyways.nl] 
Sent: Monday, December 28, 2009 6:26 PM
To: solr-user@lucene.apache.org
Cc: ZAROGKIKAS,GIORGOS
Subject: Re: Solr an Greek Chars

Hi,


Did you post your documents in UTF-8? Also, for querying through GET using
non-ascii you must reconfigure Tomcat6 as per the manual [1].


Cheers,

[1] http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

ZAROGKIKAS,GIORGOS zei:
 Hi there

 I’m using solr 1.4 under tomcat server in windows
 server 2008

  and I want to index some data that contain Greek chars

 When I try to index my data and query all of them with
 *:* all the greek chars

  returned like hieroglyphics

 can anybody help ???





 thanks in advance















 -

 Γεώργιος Ζαρογκίκας

 Τμήμα Μηχανογράφησης

  6936801497

   g.zarogki...@multirama.gr

  23o Xλμ Εθ. Οδού Αθήνων Λαμίας

 ΤΚ. 14564Driveme
 http://www.forthnet.gr/templates/driveme.aspx?c=10008226initialCenterX=486756.445initialCenterY=4221978.13initialZoomLevel=2



 P  Please consider the environment before printing this e-mail








Limiting Solr queries to predefined Values

2009-12-29 Thread zoku

Hi there!
Is it possible, to limit the Solr Queries to predefined values e.g.:
If the User enters /select?q=anywordfq=anyfilterrows=13 then the filter
and rows arguments are ignored an overwritten by the predefined values
specialfilter and 6.

The goal is to prevent users from getting particular information (e.g. very
big fields to control bandwidth) sent to their browser.

Maybe it could be done by blocking the access to the /select-path using
resin (if it used regex to discribe url-pattern).

Sincerely
Manuel Helbing
-- 
View this message in context: 
http://old.nabble.com/Limiting-Solr-queries-to-predefined-Values-tp26954887p26954887.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Limiting Solr queries to predefined Values

2009-12-29 Thread Erik Hatcher


On Dec 29, 2009, at 8:59 AM, zoku wrote:

Hi there!
Is it possible, to limit the Solr Queries to predefined values e.g.:
If the User enters /select?q=anywordfq=anyfilterrows=13 then the  
filter

and rows arguments are ignored an overwritten by the predefined values
specialfilter and 6.


Yes, you can set up the request handler mapping with an  
invariant (instead of defaults) section with those parameters  
specified in solrconfig.xml.


However, with fq, you may want to put that in an appends section  
instead, so that other filters can be specified from the client as well.


Erik



how to do a Parent/Child Mapping using entities

2009-12-29 Thread magui

Hello everybody, i would like to know how to create index supporting a
parent/child mapping and then querying the child to get the results.
in other words; imagine that we have a database containing 2
tables:Keyword[id(int), value(string)] and Result[id(int), res_url(text),
res_text(tex), res_date(date), res_rank(int)]
For indexing, i used the DataImportHandler to import data and it works well,
and my query response seems good:(q=*:*) (imagine that we have only this to
keywords and their results)

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=q*:*/str 
  /lst
  /lst
- result name=response numFound=2 start=0
- doc
  str name=id1/str 
  str name=keywordKey1/str 
- arr name=res_url
  strurl1/str 
  strurl2/str 
  strurl3/str 
  strurl4/str 
  /arr
- arr name=res_rank
  str1/str 
  str2/str 
  str3/str
  str4/str
  /arr
  /doc
- doc
  str name=id2/str 
  str name=keywordKey2/str 
- arr name=res_url
  strurl1/str 
  strurl5/str 
  strurl8/str 
  strurl7/str 
  /arr
- arr name=res_rank
  str1/str 
  str2/str 
  str3/str
  str4/str
  /arr
  /doc
  /result
  /response

but the problem is when i tape a query kind of this:q=res_url:url2 AND
res_rank:1 and this to say that i want to search for the keywords in which
the url (url2) is ranked at the first position, i have a result like this:

?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=qres_url:url2 AND res_rank:1/str 
  /lst
  /lst
- result name=response numFound=1 start=0
- doc
  str name=id1/str 
  str name=keywordKey1/str 
- arr name=res_url
  strurl1/str 
  strurl2/str 
  strurl3/str 
  strurl4/str 
  /arr
- arr name=res_rank
  str1/str 
  str2/str 
  str3/str
  str4/str
  /arr
  /doc
  /result
  /response

But this is not true; because the url present in the 1st position in the
results of the keyword key1 is url1 and not url2.
So what i want to say is : is there any solution to make the values of the
multivalued fields linked; 
so in our case we can see that the previous result say that:
 - url1 is present in 1st position of key1 results
 - url2 is present in 2nd position of key1 results
 - url3 is present in 3rd position of key1 results
 - url4 is present in 4th position of key1 results

and i would like that solr consider this when executing queries.

Any helps please; and thanks for all :)
-- 
View this message in context: 
http://old.nabble.com/how-to-do-a-Parent-Child-Mapping-using-entities-tp26956426p26956426.html
Sent from the Solr - User mailing list archive at Nabble.com.



Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle

I have an index that used to have ~38M docs at 17.2GB. I deleted all but 13K
docs using a delete by query, commit and then optimize. A *:* query now
returns 13K docs. The problem is that the files on disk are still 17.1GB in
size. I expected the optimize to shrink the files. Is there a way I can
shrink them now that the index only has 13K docs?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26958067.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Mohamed Parvez
Ditto. There should have been an DIH command to re-sync the Index with the
DB.
Right now it looks like one way street form DB to Index.


On Tue, Dec 29, 2009 at 3:07 AM, Ravi Gidwani ravi.gidw...@gmail.comwrote:

 Hi Shalin:

I get your point about not knowing what has been deleted from
 the database. So this is what even I am looking for:
 
  0) A document (id=100) is currently part of solr index.(
  1) Lets say the application deleted a record with id=100 from database.
 
  2) Now I need to execute some DIH command to say remove document where
 id=100. I dont expect the DIH to automatically detect what has been deleted,
  but I am looking for a DIH command/special-command to request deletion
 from index.
 
  Is that possible ? also as an alternate solution, is it possible to build
 index using DIH, and use the solr.XmlUpdateRequestHandler request handler to
 delete/update these one off documents ?
  Is this something you will recommend ?
 
  Thanks,
  ~Ravi Gidwani.
 
  On Tue, Dec 29, 2009 at 3:03 AM, Mohamed Parvez par...@gmail.com
 wrote:
 
   I have looked in the that thread earlier. But there is no option there
 for
 
   a
   solution from Solr side.
  
   I mean the two more options there are
   1] Use database triggers instead of DIH to manage updating the index :-
   This out of question as we cant run 1000 odd triggers every hour to
 delete.
 
  
   2] Some sort of ORM use its interception:-
   This is also out of question as the deletes happens form external
 system or
   directly on the database, not through our application.
  
  
 
   To Say in Short, Solr Should have something thing to keep the index
 synced
   with the database. As of now its one way street, updates rows, on DB
 will
   go
   to the index. Deleted rows in the DB, will not be deleted from the
 Index
 
  
  
  How can Solr figure out what has been deleted? Should it go through each
 row
  and comparing against each doc? Even then some things are not possible
  (think indexed fields). It would be far efficient to just do a
 full-import
 
  each time instead.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 



RE: Unable to delete from index

2009-12-29 Thread Ankit Bhatnagar

It looks like you are using the solr multicore.

How are you setting the solr home (meaning which like are u suisng to tell the 
tomcat about ur solr home path)


Ankit


-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, December 28, 2009 11:32 PM
To: solr-user@lucene.apache.org
Subject: RE: Unable to delete from index

Here you go. 

Thanks for your help!
Gio.

-Original Message-
From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] 
Sent: Monday, December 28, 2009 10:09 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unable to delete from index

Could you share both your solr.xml and solrconfig.xml

Ankit


-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, December 28, 2009 5:46 PM
To: solr-user@lucene.apache.org
Subject: RE: Unable to delete from index

Sorry - hit reply too early. I edited my config as you suggested, rebooted 
Tomcat, and I can still find the doc through the Solr Admin interface even 
though I can't find it in Luke. 

-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, December 28, 2009 5:44 PM
To: solr-user@lucene.apache.org
Subject: RE: Unable to delete from index

My HTTP caching is currently configured for Open Time

httpCaching lastModifiedFrom=openTime
 etagSeed=Solr


So that shouldn't be the problem, right?

-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com] 
Sent: Monday, December 28, 2009 5:31 PM
To: solr-user@lucene.apache.org
Subject: RE: Unable to delete from index

 I opened up my index using Luke,
 found a document by searching for a specific ID
 (versionId:2002366155), and then I deleted it using Luke.
 After committing, performing the search again in Luke
 yielded no results. 
 
 However, when I perform that same search using Solr, I get
 a result. 
 
 That got me thinking that I was opening up the wrong
 directory in Luke but I've double-checked it a few times. 
 
 Is it a problem that I have my data directory defined in
 solr.xml and not in solrconfig.xml?


If you are querying solr from a browser can you disable http caching in 
solrconfig.xml   httpCaching never304=true and then try again?
  


  


Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread rob

Hi Mark,

I can't help with reducing filesizes, but I'm curious...

What sort of documents were you storing, number of fields, average document 
size, many dynamic fields or mainly all static?

It would be good to hear about a real-world large-scale index in terms of 
response times, did the server have enough RAM to store it all in memory?

Cheers,
Rob




On Tue 29/12/09 18:23 , markwaddle m...@markwaddle.com wrote:

 I have an index that used to have ~38M docs at 17.2GB. I deleted all
 but 13K
 docs using a delete by query, commit and then optimize. A *:*
 query now
 returns 13K docs. The problem is that the files on disk are still
 17.1GB in
 size. I expected the optimize to shrink the files. Is there a way I
 can
 shrink them now that the index only has 13K docs?
 Mark
 -- 
 View this message in context:
 http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-f
 ile-size-tp26958067p26958067.htmlSent from the Solr - User mailing list 
 archive at Nabble.com.
 
 
Message sent via Atmail Open - http://atmail.org/


boosting on string distance

2009-12-29 Thread Joe Calderon
hello *, i want to boost documents that match the query better,
currently i also index my field as a string an boost if i match the
string field

but im wondering if its possible to boost with bf parameter with a
formula using the function strdist(), i know one of the columns would
be the field name, but how do i specify the use query as the other
parameter?

http://wiki.apache.org/solr/FunctionQuery#strdist


best,

--joe


performance question

2009-12-29 Thread A. Steven Anderson
Greetings!

Is there any significant negative performance impact of using a
dynamicField?

Likewise for multivalued fields?

The reason why I ask is that our system basically aggregates data from many
disparate data sources (structured, unstructured, and semi-structured), and
the management of the schema.xml has become unwieldy; i.e. we currently have
dozens of fields which grows every time we add a new data source.

I was considering redefining the domain model outside of Solr which would be
used to generate the fields for the indexing process and the metadata (e.g.
display names) for the search process.

Thoughts?
-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com


DIH optional fields?

2009-12-29 Thread A. Steven Anderson
Greetings!

I'm trying to index a MySQL database that has some invalid dates (e.g.
-00-00) which is causing my DIH to abort.

Ideally, I'd like DIH to skip this optional field but not the whole record.

I don't see any way to do this currently, but is there any work-around?

Should there be a JIRA for a field-level optional or onError parameter?

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com


Re: DIH optional fields?

2009-12-29 Thread AHMET ARSLAN
 I'm trying to index a MySQL database that has some invalid
 dates (e.g.
 -00-00) which is causing my DIH to abort.
 
 Ideally, I'd like DIH to skip this optional field but not
 the whole record.
 
 I don't see any way to do this currently, but is there any
 work-around?

Use zeroDateTimeBehavior=convertToNull parameter in you sql connection string.


  


Re: DIH optional fields?

2009-12-29 Thread A. Steven Anderson
 Use zeroDateTimeBehavior=convertToNull parameter in you sql connection
 string.


That worked great!

Thanks!

-- 
A. Steven Anderson
Independent Consultant
A. S. Anderson  Associates LLC
P.O. Box 672
Forest Hill, MD  21050-0672
443-790-4269
st...@asanderson.com


Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread Yonik Seeley
On Tue, Dec 29, 2009 at 1:23 PM, markwaddle m...@markwaddle.com wrote:
 I have an index that used to have ~38M docs at 17.2GB. I deleted all but 13K
 docs using a delete by query, commit and then optimize. A *:* query now
 returns 13K docs. The problem is that the files on disk are still 17.1GB in
 size. I expected the optimize to shrink the files. Is there a way I can
 shrink them now that the index only has 13K docs?

Are you on Windows?
The IndexWriter can't delete files in use by the current IndexReader
(like it can in UNIX) when the commit is done.
If you make further changes to the index and do a commit, you should
see the space go down.

-Yonik
http://www.lucidimagination.com


SOLR or Hibernate Search?

2009-12-29 Thread Márcio Paulino
Hey Everyone!

I was make a comparison of both technologies (SOLR AND Hibernate Search) and
i see many things are equals. Anyone could told me when i must use SOLR and
when i must use Hibernate Search?

Im my project i will have:

1. Queries for indexed fields (Strings) and for not indexed Fields (Integer,
Float, Date). [In Hibernate Search on in SOLR, i must search on index and,
with results of query, search on database (I can't search in both places ate
same time).]
I Will Have search like:
Give me all Register Where Value  190 And Name Contains = 'JAVA' 

2. My client need process a lot of email (20.000 per day) and i must indexed
all fields (excluded sentDate ) included Attachments, and performance is
requirement of my System

3. My Application is multiclient, and i need to separate the index by
client.

In this Scenario, whats the best solution? SOLR or HIbernateSearch

I See SOLR is a dedicated server and has a good performance test. I don't
see advantages to use hibernate-search in comparison with SOLR (Except the
fact of integrate with my Mapped Object)

Thanks for Help

-- 
att,

**
Márcio Paulino
Campo Grande - MS
MSN / Gtalk: mcopaul...@gmail.com
ICQ: 155897898
**


Re: how to do a Parent/Child Mapping using entities

2009-12-29 Thread Sascha Szott

Hi,

you could create an additional index field res_ranked_url that contains 
the concatenated value of an url and its corresponding rank, e.g.,


res_rank +   + res_url

Then, q=res_ranked_url:1 url1 retrieves all documents with url1 as the 
first url.


A drawback of this workaround is that you have to use a phrase query 
thus preventing wildcard searches for urls.


-Sascha



Hello everybody, i would like to know how to create index supporting a
parent/child mapping and then querying the child to get the results.
in other words; imagine that we have a database containing 2
tables:Keyword[id(int), value(string)] and Result[id(int), res_url(text),
res_text(tex), res_date(date), res_rank(int)]
For indexing, i used the DataImportHandler to import data and it works well,
and my query response seems good:(q=*:*) (imagine that we have only this to
keywords and their results)

   ?xml version=1.0 encoding=UTF-8 ?
-response
-lst name=responseHeader
   int name=status0/int
   int name=QTime0/int
-lst name=params
   str name=q*:*/str
   /lst
   /lst
-result name=response numFound=2 start=0
-doc
   str name=id1/str
   str name=keywordKey1/str
-arr name=res_url
   strurl1/str
   strurl2/str
   strurl3/str
   strurl4/str
   /arr
-arr name=res_rank
   str1/str
   str2/str
   str3/str
   str4/str
   /arr
   /doc
-doc
   str name=id2/str
   str name=keywordKey2/str
-arr name=res_url
   strurl1/str
   strurl5/str
   strurl8/str
   strurl7/str
   /arr
-arr name=res_rank
   str1/str
   str2/str
   str3/str
   str4/str
   /arr
   /doc
   /result
   /response

but the problem is when i tape a query kind of this:q=res_url:url2 AND
res_rank:1 and this to say that i want to search for the keywords in which
the url (url2) is ranked at the first position, i have a result like this:

?xml version=1.0 encoding=UTF-8 ?
-response
-lst name=responseHeader
   int name=status0/int
   int name=QTime0/int
-lst name=params
   str name=qres_url:url2 AND res_rank:1/str
   /lst
   /lst
-result name=response numFound=1 start=0
-doc
   str name=id1/str
   str name=keywordKey1/str
-arr name=res_url
   strurl1/str
   strurl2/str
   strurl3/str
   strurl4/str
   /arr
-arr name=res_rank
   str1/str
   str2/str
   str3/str
   str4/str
   /arr
   /doc
   /result
   /response

But this is not true; because the url present in the 1st position in the
results of the keyword key1 is url1 and not url2.
So what i want to say is : is there any solution to make the values of the
multivalued fields linked;
so in our case we can see that the previous result say that:
  - url1 is present in 1st position of key1 results
  - url2 is present in 2nd position of key1 results
  - url3 is present in 3rd position of key1 results
  - url4 is present in 4th position of key1 results

and i would like that solr consider this when executing queries.

Any helps please; and thanks for all :)




Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-29 Thread Prasanna R
 
   We do auto-complete through prefix searches on shingles.
  
 
  Just to confirm, do you mean using EdgeNgram filter to produce letter
  ngrams
  of the tokens in the chosen field?
 
 

 No, I'm talking about prefix search on tokens produced by a ShingleFilter.


 I did not know about the Prefix query parser in Solr. Thanks a lot for
 pointing out the same.

 I find relatively little online material about the Solr/Lucene prefix query
 parser. Kindly point me to any useful resource that I might be missing.


 I looked into the Solr/Lucene classes and found the required information.
Am summarizing the same for the benefit of those that might refer to this
thread in the future.

 The change I had to make was very simple - make a call to getPrefixQuery
instead of getWildcardQuery in my custom-modified Solr dismax query parser
class. However, this will make a fairly significant difference in terms of
efficiency. The key difference between the lucene WildcardQuery and
PrefixQuery lies in their respective term enumerators, specifically in the
term comparators. The termCompare method for PrefixQuery is more
light-weight than that of WildcardQuery and is essentially an optimization
given that a prefix query is nothing but a specialized case of Wildcard
query. Also, this is why the lucene query parser automatically creates a
PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery.

A big thank you to Shalin for providing valuable guidance and insight.

And one final request for Comment to Shalin on this topic - I am guessing
you ensured there were no duplicate terms in the field(s) used for
autocompletion. For our first version, I am thinking of eliminating the
duplicates outside of the results handler that gives suggestions since
duplicate suggestions originate only from different document IDs in our
system and we do want the list of document IDs matched. Is there a
better/different way of doing the same?

Regards,

Prasanna.


Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle



Yonik Seeley-2 wrote:
 
 On Tue, Dec 29, 2009 at 1:23 PM, markwaddle m...@markwaddle.com wrote:
 I have an index that used to have ~38M docs at 17.2GB. I deleted all but
 13K
 docs using a delete by query, commit and then optimize. A *:* query now
 returns 13K docs. The problem is that the files on disk are still 17.1GB
 in
 size. I expected the optimize to shrink the files. Is there a way I can
 shrink them now that the index only has 13K docs?
 
 Are you on Windows?
 The IndexWriter can't delete files in use by the current IndexReader
 (like it can in UNIX) when the commit is done.
 If you make further changes to the index and do a commit, you should
 see the space go down.
 
 -Yonik
 http://www.lucidimagination.com
 
 

I am on Windows. Would a DataImportHandler delta-import with 1 or more
changes be a sufficient change to allow the files to be deleted?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960857.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle



Yonik Seeley-2 wrote:
 
 If you make further changes to the index and do a commit, you should
 see the space go down.
 

It worked. I added a bogus document using /update and then performed a
commit and now the files are down to 6MB.

http://.../core00/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E0%3C/field%3E%3C/doc%3E%3C/add%3E

http://.../core00/update?stream.body=%3Ccommit/%3E

Thanks!
Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960957.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: boosting on string distance

2009-12-29 Thread Grant Ingersoll

On Dec 29, 2009, at 1:59 PM, Joe Calderon wrote:

 hello *, i want to boost documents that match the query better,
 currently i also index my field as a string an boost if i match the
 string field
 
 but im wondering if its possible to boost with bf parameter with a
 formula using the function strdist(), i know one of the columns would
 be the field name, but how do i specify the use query as the other
 parameter?
 
 http://wiki.apache.org/solr/FunctionQuery#strdist

You can just quote the Query, as in strdist(user query, field)

I guess it is a bit more complicated if you have operators in there, so you 
probably need to do some processing on the client side.

Is that what you are after? 

You could also use the fuzzy query and forego a function query altogether, 
although that only allows for edit distance.  Or, if you are simply indexing 
the field as a string, just do a regular term query match against it with your 
whole query as a single field and then boost that clause in the query.



-Grant

Solr Cell - PDFs plus literal metadata - GET or POST ?

2009-12-29 Thread Ross
Hi all

I'm experimenting with Solr. I've successfully indexed some PDFs and
all looks good but now I want to index some PDFs with metadata pulled
from another source. I see this example in the docs.

curl 
http://localhost:8983/solr/update/extract?literal.id=doc4captureAttr=truedefaultField=textcapture=divfmap.div=foo_tboost.foo_t=3literal.blah_s=Bah;
 -F tutori...@tutorial.pdf

I can write code to generate a script with those commands substituting
my own literal.whatever.  My metadata could be up to a couple of KB in
size. Is there a way of making the literal a POST variable rather than
a GET?  Will Solr Cell accept it as a POST? Something doesn't feel
right about generating a huge long URL. I think Tomcat can handle up
to 8 KB by default so I guess that's okay although I'm not sure how
long a Linux command line can reasonably be.

I know Curl may not be the right thing to use for production use but
this is initially to get some data indexed for test and demo.

Thanks
Ross


Re: SOLR or Hibernate Search?

2009-12-29 Thread Kiwi de coder
hi,

hibernate search is only work with hibernate, while solr can use for
difference system other then hibernate (loose coupling)

current solr still not support complex POJO index like what hibernate did.

1) I think one way u can do is index on solr and retrieve the unique id and
get from database. e.g. select enetity form table where id in (x, y) 

2) i not yet test out, but i do believe lucene performance is quite good and
it still keep improving (u can add more search server if u using solr)

3) maybe solr is more suitable in u case.

hope this help

kiwi

2009/12/30 Márcio Paulino mcopaul...@gmail.com

 Hey Everyone!

 I was make a comparison of both technologies (SOLR AND Hibernate Search)
 and
 i see many things are equals. Anyone could told me when i must use SOLR and
 when i must use Hibernate Search?

 Im my project i will have:

 1. Queries for indexed fields (Strings) and for not indexed Fields
 (Integer,
 Float, Date). [In Hibernate Search on in SOLR, i must search on index and,
 with results of query, search on database (I can't search in both places
 ate
 same time).]
 I Will Have search like:
 Give me all Register Where Value  190 And Name Contains = 'JAVA' 

 2. My client need process a lot of email (20.000 per day) and i must
 indexed
 all fields (excluded sentDate ) included Attachments, and performance is
 requirement of my System

 3. My Application is multiclient, and i need to separate the index by
 client.

 In this Scenario, whats the best solution? SOLR or HIbernateSearch

 I See SOLR is a dedicated server and has a good performance test. I don't
 see advantages to use hibernate-search in comparison with SOLR (Except the
 fact of integrate with my Mapped Object)

 Thanks for Help

 --
 att,

 **
 Márcio Paulino
 Campo Grande - MS
 MSN / Gtalk: mcopaul...@gmail.com
 ICQ: 155897898
 **



Re: SOLR or Hibernate Search?

2009-12-29 Thread Ryan McKinley

If you need to search via the Hibernate API, then use hibernate search.

If you need a scaleable HTTP (REST) then solr may be the way to go.

Also, i don't think hibernate has anything like the faceting / complex  
query stuff etc.




On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote:


Hey Everyone!

I was make a comparison of both technologies (SOLR AND Hibernate  
Search) and
i see many things are equals. Anyone could told me when i must use  
SOLR and

when i must use Hibernate Search?

Im my project i will have:

1. Queries for indexed fields (Strings) and for not indexed Fields  
(Integer,
Float, Date). [In Hibernate Search on in SOLR, i must search on  
index and,
with results of query, search on database (I can't search in both  
places ate

same time).]
I Will Have search like:
Give me all Register Where Value  190 And Name Contains = 'JAVA' 

2. My client need process a lot of email (20.000 per day) and i must  
indexed
all fields (excluded sentDate ) included Attachments, and  
performance is

requirement of my System

3. My Application is multiclient, and i need to separate the index by
client.

In this Scenario, whats the best solution? SOLR or HIbernateSearch

I See SOLR is a dedicated server and has a good performance test. I  
don't
see advantages to use hibernate-search in comparison with SOLR  
(Except the

fact of integrate with my Mapped Object)

Thanks for Help

--
att,

**
Márcio Paulino
Campo Grande - MS
MSN / Gtalk: mcopaul...@gmail.com
ICQ: 155897898
**