Re: Solr and Tag Cloud

2011-06-18 Thread Mohammad Shariq
I am also looking for the same, Is there any way to find the cloud-tag of
all the documents matching a specific query.


On 18 June 2011 09:42, Jamie Johnson jej2...@gmail.com wrote:

 Does anyone have details of how to generate a tag cloud of popular terms
 across an entire data set and then also across a query?




-- 
Thanks and Regards
Mohammad Shariq


Re: Caching queries.

2011-06-18 Thread Shawn Heisey

On 6/17/2011 4:26 PM, arian487 wrote:

I'm wondering if something like this is possible.  Lets say I want to query
5000 objects all pertaining to a specific search and I want to return the
top 100 or something and cache the rest on my solr server.  The next time I
get the same query or something with a new offset (lets say start from 101)
does it have to do the query again or can it go to cache and get the next
100?


In solrconfig.xml, you should have a query section.  In that section, 
you can place a setting like the following:


queryResultWindowSize200/queryResultWindowSize

This is described in the example solrconfig.xml and here:

http://wiki.apache.org/solr/SolrCaching#queryResultWindowSize

Shawn



Re: Solr and Tag Cloud

2011-06-18 Thread Dmitry Kan
One option would be to load each term into shingles field and then facet on
them for the user query.
Another is to use http://wiki.apache.org/solr/TermsComponent.

With the first one you can load not only separate terms, but also their
sequences and then experiment with the optimal shingle sequence (ngram)
length.

On Sat, Jun 18, 2011 at 7:12 AM, Jamie Johnson jej2...@gmail.com wrote:

 Does anyone have details of how to generate a tag cloud of popular terms
 across an entire data set and then also across a query?




-- 
Regards,

Dmitry Kan


Re: Showing facet of first N docs

2011-06-18 Thread Dmitry Kan
Do you mean you would like to boost the facets that contain the most of the
lemmas?
What is the user query in this case and if possible, what is the use case
(may be some other solution exists for what you are trying to achieve)?

On Thu, Jun 16, 2011 at 5:23 PM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Thanks Dmitry, but maybe I didn't explain correctly as I am not sure
 facet.offset is the right solution, I'd like not to page but to filter
 facets.
 I'll try to explain better with an example.
 Imagine I make a query and first 2 docs in results have both 'xyz' and
 'abc'
 as values for field 'lemmas' while also other docs in the results have
 'xyz'
 or 'abc' as values of field 'lemmas' then I would like to show facets
 coming from only the first 2 docs in the results thus having :
 lst name=lemmas
  str name=xyz2/str
  str name=abc2/str
 /lst
 You can imagine this like a 'give me only facets related to the most
 relevant docs in the results' functionality.
 Any idea on how to do that?
 Tommaso


 2011/6/16 Dmitry Kan dmitry@gmail.com

  http://wiki.apache.org/solr/SimpleFacetParameters
  facet.offset
 
  This param indicates an offset into the list of constraints to allow
  paging.
 
  The default value is 0.
 
  This parameter can be specified on a per field basis.
 
 
  Dmitry
 
 
  On Thu, Jun 16, 2011 at 1:39 PM, Tommaso Teofili
  tommaso.teof...@gmail.comwrote:
 
   Hi all,
   Do you know if it is possible to show the facets for a particular field
   related only to the first N docs of the total number of results?
   It seems facet.limit doesn't help with it as it defines a window in the
   facet constraints returned.
   Thanks in advance,
   Tommaso
  
 
 
 
  --
  Regards,
 
  Dmitry Kan
 




-- 
Regards,

Dmitry Kan


How do i use solr spellchecker in my search application

2011-06-18 Thread Romi
Hi, I want to implement spellchecker in my search application using solr. i
did required changes in solr-config.xml file. and run the url 
http://localhost:8983/solr/spell?q=hell
ultrasharspellcheck=truespellcheck.collate=truespellcheck.build=true
 as given in http://wiki.apache.org/solr/SpellCheckComponent.

But i am really confused why i need to give q queryparameter here. as i
suppose this url is used to index according to spellchecker.

nayhow i got the spleechecker indexes throuth it. but while i search i dont
get the result. for example here in the urel q=hell which is actually dell
in index. but when i search for hell i dont get the answer. while when i run
the url i got suggestion dell for hell. 

for searching i use the url  
http://localhost:8983/solr/select/?q=hellversion=2.2start=0rows=10indent=on

Do i need to change the url??

Please explain me

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-i-use-solr-spellchecker-in-my-search-application-tp3079090p3079090.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr highliting feature

2011-06-18 Thread Romi
I want to highlight some search result value. i used solr for this. as i
suppose solr provides highlighting feature. i used it i configure
highlighting in solr-config.xml. i set hl=true and hl.fl=somefield at
query time in my url when i run the url it gives me a xml representation of
search results where i got a tag highliting.

further i am parsing this xml response to show result in a jsp page. but i
ma not getting how can i high lite the fields in jsp page

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3079239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Showing facet of first N docs

2011-06-18 Thread lee carroll
Hi Tommaso

I don't think you can achieve what you want using vanilla solr.
Facet counts will be for the result set matching not for the top n
result sets matching.

However what is your use case ? Assuming its for faceted navigation
showing facets for the
top n result sets could be confusing to your users. As the next
incremental filter applied by the user would change the relevancy
focus of the user and produce another set of top n facet counts with
a document set un-related to the last result set. This could be a very
bad user experience producing a fluctuating facet counts (ie a filter
narrowing the search could produce an increase in a facet term count -
very odd) also the result set could change strangely with docs
floating in and out of the result list.

relevancy seems to be the answer here - if your docs are scored
correctly then counting all docs in the result set for the facet
counts is correct. do you need to improve relevancy?




On 18 June 2011 08:23, Dmitry Kan dmitry@gmail.com wrote:
 Do you mean you would like to boost the facets that contain the most of the
 lemmas?
 What is the user query in this case and if possible, what is the use case
 (may be some other solution exists for what you are trying to achieve)?

 On Thu, Jun 16, 2011 at 5:23 PM, Tommaso Teofili
 tommaso.teof...@gmail.comwrote:

 Thanks Dmitry, but maybe I didn't explain correctly as I am not sure
 facet.offset is the right solution, I'd like not to page but to filter
 facets.
 I'll try to explain better with an example.
 Imagine I make a query and first 2 docs in results have both 'xyz' and
 'abc'
 as values for field 'lemmas' while also other docs in the results have
 'xyz'
 or 'abc' as values of field 'lemmas' then I would like to show facets
 coming from only the first 2 docs in the results thus having :
 lst name=lemmas
  str name=xyz2/str
  str name=abc2/str
 /lst
 You can imagine this like a 'give me only facets related to the most
 relevant docs in the results' functionality.
 Any idea on how to do that?
 Tommaso


 2011/6/16 Dmitry Kan dmitry@gmail.com

  http://wiki.apache.org/solr/SimpleFacetParameters
  facet.offset
 
  This param indicates an offset into the list of constraints to allow
  paging.
 
  The default value is 0.
 
  This parameter can be specified on a per field basis.
 
 
  Dmitry
 
 
  On Thu, Jun 16, 2011 at 1:39 PM, Tommaso Teofili
  tommaso.teof...@gmail.comwrote:
 
   Hi all,
   Do you know if it is possible to show the facets for a particular field
   related only to the first N docs of the total number of results?
   It seems facet.limit doesn't help with it as it defines a window in the
   facet constraints returned.
   Thanks in advance,
   Tommaso
  
 
 
 
  --
  Regards,
 
  Dmitry Kan
 




 --
 Regards,

 Dmitry Kan



relevant result for query with boost factor on parameters

2011-06-18 Thread Naveen Gupta
Hi,
I am trying to achieve this use case with following expectation

three fields

1. field1
2. field2
3. field3

field1 should have the max relevance

field2 should have the next

field3 is the last

the term will be entered by end user (say* rock roll*)

i want to show the results which will contain *rock and roll* both in field1
(first)

i want to show the results which will contain *rock and roll* both in field
2 (first)

these should be only done for a given* field3 (x...@gmail.com)*

but if suppose field1 does not contain both the term *rock and roll,
*
*special attention *then field 2 results should take the priority (show
the results which has both the terms first and then show the results with
respect to boost factor or relevance)

if both the fields do not contain these terms together (show as normal one
with field1 having more relevance than field2)

how to join the results for field3

that means for a given field3, the above results should be filtered.

I am trying this one, giving satisfactory results, but not the best one,

field1:(rock roll)^20 field2:(rock roll)^4 field3:x...@gmail.com

i was thinking of givning

filed1 field2  field3

but not working.

Can you help in this regard?

What other config should i consider in terms of given context ?


Thanks
Naveen


Why does paste get parsed into past?

2011-06-18 Thread Gabriele Kahlout
Hello,

Debugging query results I find that:
str name=querystringpaste/str
  str name=parsedquerycontent:past/str

Now paste and past are two different words. Why does Solr not consider
that? How do I make it?

--
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
time(x)  Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the
email does not contain a valid code then the email is not received. A
valid code starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
∈ L(-[a-z]+[0-9]X)).


Why are not query keywords treated as a set?

2011-06-18 Thread Gabriele Kahlout
q=past past

1.0 = (MATCH) sum of:
*  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
   1.0 = tf(termFreq(content:past)=1)
   1.0 = idf(docFreq=1, maxDocs=2)
   0.5 = fieldNorm(field=content, doc=0)
*  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
   1.0 = tf(termFreq(content:past)=1)
   1.0 = idf(docFreq=1, maxDocs=2)
   0.5 = fieldNorm(field=content, doc=0)

Is there how I can treat the query keywords as a set?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Boost Strangeness

2011-06-18 Thread Judioo
WONDERFUL!
Just reporting back.
This document is ACE

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

For explaining what the filters are and how to affect the analyzer.

Erik your statement First, boosting isn't absolute  played on me so
I continued to investigate boosting.

I found this document that ( at last ) explains the dismax logic

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

The reason why I was not getting the order I require was due to:
A)  my boost metrics were too close together.
b) similar id's in a document affected the score


It seems that if a partial match is made the product ( a % of the
total boost ) contributes to the documents score.
This meant that one type of document in the index had a higher
aggregate score due to the fact it had all but one of the boosted
fields ( does not have parent_id ) in it and the fields where
populated with content that was *very* similar to the requested id.

for example

required id = b011mg62
X_id = b011mgsf

Due to the partial matching and closeness of the boost ranges this
type of document always aquired a higher score than another document
with just one matching field ( i.e. id field ).

My solution was to increase the value of the fields I wanted to *really* count

id^10 parent_id^5000 brand_container_id^500 

As a result even if there are similar matches in any field the id and
parent_id matches should always receive a higher boost.


This was also useful
http://stackoverflow.com/questions/2179497/adding-date-boosting-to-complex-solr-queries


Thanks for the help!


Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
What do you have set up for stemming?

François

On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote:

 Hello,
 
 Debugging query results I find that:
 str name=querystringpaste/str
  str name=parsedquerycontent:past/str
 
 Now paste and past are two different words. Why does Solr not consider
 that? How do I make it?
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A
 valid code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
 ∈ L(-[a-z]+[0-9]X)).



merging highlights

2011-06-18 Thread Jamie Johnson
I have a setup where I have a title and a phonetic_title, I'm using the
edismax query parser and doing a weighted search across the two fields,
there are cases where phonetic_title matches part of the string and title
matches another, i.e. if my query was foo AND subject:bar and the fields had

title: phoo bar
phonetic_title: foo br (obviously making this up)

I'd get back in the highlights the following

phonetic_title: emfoo/em bar
title: foo embar/em

Is there any utility to merge these results so that the title looks like

title: emfoo/em embar/em


Re: Why does paste get parsed into past?

2011-06-18 Thread Gabriele Kahlout
I'm !sure where those are set, but on reflection I'd keep the default
settings. My real issue is why are not query keywords treated as a
set?http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201106.mbox/%3CBANLkTikHunhyWc2WVTofRYU4ZW=c8oe...@mail.gmail.com%3E
2011/6/18 François Schiettecatte fschietteca...@gmail.com

 What do you have set up for stemming?

 François

 On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote:

  Hello,
 
  Debugging query results I find that:
  str name=querystringpaste/str
   str name=parsedquerycontent:past/str
 
  Now paste and past are two different words. Why does Solr not consider
  that? How do I make it?
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
  time(x)  Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
  email does not contain a valid code then the email is not received. A
  valid code starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
  ∈ L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
What I meant was what stemmer are you using? Maybe it is the stemmer that is 
cutting the 'e'. You can check that on the field analysis solr web page.

François

On Jun 18, 2011, at 11:42 AM, Gabriele Kahlout wrote:

 I'm !sure where those are set, but on reflection I'd keep the default
 settings. My real issue is why are not query keywords treated as a
 set?http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201106.mbox/%3CBANLkTikHunhyWc2WVTofRYU4ZW=c8oe...@mail.gmail.com%3E
 2011/6/18 François Schiettecatte fschietteca...@gmail.com
 
 What do you have set up for stemming?
 
 François
 
 On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote:
 
 Hello,
 
 Debugging query results I find that:
 str name=querystringpaste/str
 str name=parsedquerycontent:past/str
 
 Now paste and past are two different words. Why does Solr not consider
 that? How do I make it?
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A
 valid code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
 ∈ L(-[a-z]+[0-9]X)).
 
 
 
 
 -- 
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).



Is it true that I cannot delete stored content from the index?

2011-06-18 Thread Gabriele Kahlout
Hello,

I've indexing with the content field stored. Now I'd like to delete all
stored content, is there how to do that without re-indexing?

It seems not from lucene
FAQhttp://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
:
How do I update a document or a set of documents that are already
indexed? There
is no direct update procedure in Lucene. To update an index incrementally
you must first *delete* the documents that were updated, and *then
re-add*them to the index.

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: merging highlights

2011-06-18 Thread Jamie Johnson
Perhaps a better question is this.  Looking at DefaultSolrHighlighter I'd
like to make modifications so that when doing highlighting on a specific
field it automatically checks to see if there is another _phonetic field
available to also pull tokens from.  It looks like I could do this extending
the DefaultSolrHighlighter by overridding making my own
doHighlightingByHighlighter which does the following:
Around Line 431:
TokenStream tvStream =
TokenSources.getTokenStream(searcher.getIndexReader(), docId, fieldName);
TokenStream tvStream_phonetic =
TokenSources.getTokenStream(searcher.getIndexReader(), docId, fieldName +
_phonetic);
tvStream_phonetic.copyTo(tvStream);

and further down doing
Around Line 446
// fall back to analyzer
tstream = createAnalyzerTStream(schema, fieldName,
docTexts[j]);
TokenStream tstream_phonetic = createAnalyzerTStream(schema,
fieldName + _phonetic, docTexts[j]);
tstream_phonetic.copyTo(tstream);

now I'd obviously need a way to see if fieldName + _phonetic existed
before making the calls to get the token streams for them (anyone have an
idea how to do that?).

Only problem is this would only work for the default highlighter, using fast
vector highlighter I don't see a clear way to do this.

Am I the only person looking to do something along these lines?

On Sat, Jun 18, 2011 at 9:59 AM, Jamie Johnson jej2...@gmail.com wrote:

 I have a setup where I have a title and a phonetic_title, I'm using the
 edismax query parser and doing a weighted search across the two fields,
 there are cases where phonetic_title matches part of the string and title
 matches another, i.e. if my query was foo AND subject:bar and the fields had

 title: phoo bar
 phonetic_title: foo br (obviously making this up)

 I'd get back in the highlights the following

 phonetic_title: emfoo/em bar
 title: foo embar/em

 Is there any utility to merge these results so that the title looks like

 title: emfoo/em embar/em





Re: Multiple indexes

2011-06-18 Thread shacky
2011/6/15 Edoardo Tosca e.to...@sourcesense.com:
 Try to use multiple cores:
 http://wiki.apache.org/solr/CoreAdmin

Can I do concurrent searches on multiple cores?


Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
Sure.

François

On Jun 18, 2011, at 2:25 PM, shacky wrote:

 2011/6/15 Edoardo Tosca e.to...@sourcesense.com:
 Try to use multiple cores:
 http://wiki.apache.org/solr/CoreAdmin
 
 Can I do concurrent searches on multiple cores?



Re: Multiple indexes

2011-06-18 Thread shacky
Il 18 giugno 2011 20:27, François Schiettecatte
fschietteca...@gmail.com ha scritto:
 Sure.

So I can have some searches similar to JOIN on MySQL?
The problem is that I need at least two tables in which search data..


Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
You would need to run two independent searches and then 'join' the results.

It is best not to apply a 'sql' mindset to SOLR when it comes to 
(de)normalization, whereas you strive for normalization in sql, that is usually 
counter-productive in SOLR. For example, I am working on a project with 30+ 
normalized tables, but only 4 cores.

Perhaps describing what you are trying to achieve would give us greater insight 
and thus be able to make more concrete recommendation?

Cheers

François 

On Jun 18, 2011, at 2:36 PM, shacky wrote:

 Il 18 giugno 2011 20:27, François Schiettecatte
 fschietteca...@gmail.com ha scritto:
 Sure.
 
 So I can have some searches similar to JOIN on MySQL?
 The problem is that I need at least two tables in which search data..



Optimize taking two steps and extra disk space

2011-06-18 Thread Shawn Heisey
I've noticed something odd in Solr 3.2 when it does an optimize.  One of 
my shards (freshly built via DIH full-import) had 37 segments, totalling 
17.38GB of disk space.  13 of those segments were results of merges 
during initial import, the other 24 were untouched after creation.  
Starting at _0, the final segment before optimizing is _co.  The 
mergefactor on the index is 35, chosen because it makes merged segments 
line up nicely on z boundaries.


The optmization process created a _cp segment of 14.4GB, followed by a 
_cq segment at the final 17.27GB size, so at the peak, it took 49GB of 
disk space to hold the index.


Is there any way to make it do the optimize in one pass?  Is there a 
compelling reason why it does it this way?


Thanks,
Shawn



Re: Is it true that I cannot delete stored content from the index?

2011-06-18 Thread Erick Erickson
Yep, you've got to delete and re-add. Although if you have a
uniqueKey defined you
can just re-add that document and Solr will automatically delete the underlying
document.

You might have to optimize the index afterwards to get the data to really
disappear since the deletion process just marks the document as
deleted.

Best
Erick

On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Hello,

 I've indexing with the content field stored. Now I'd like to delete all
 stored content, is there how to do that without re-indexing?

 It seems not from lucene
 FAQhttp://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
 :
 How do I update a document or a set of documents that are already
 indexed? There
 is no direct update procedure in Lucene. To update an index incrementally
 you must first *delete* the documents that were updated, and *then
 re-add*them to the index.

 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).



Re: Is it true that I cannot delete stored content from the index?

2011-06-18 Thread Mohammad Shariq
I have define uniqueKey in my solr and Deleting the docs from solr using
this uniqueKey.
and then doing optimization once in a day.
is this right way to delete ???

On 19 June 2011 05:14, Erick Erickson erickerick...@gmail.com wrote:

 Yep, you've got to delete and re-add. Although if you have a
 uniqueKey defined you
 can just re-add that document and Solr will automatically delete the
 underlying
 document.

 You might have to optimize the index afterwards to get the data to really
 disappear since the deletion process just marks the document as
 deleted.

 Best
 Erick

 On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I've indexing with the content field stored. Now I'd like to delete all
  stored content, is there how to do that without re-indexing?
 
  It seems not from lucene
  FAQ
 http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
 
  :
  How do I update a document or a set of documents that are already
  indexed? There
  is no direct update procedure in Lucene. To update an index incrementally
  you must first *delete* the documents that were updated, and *then
  re-add*them to the index.
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).
 




-- 
Thanks and Regards
Mohammad Shariq