Re: performace jetty (jetty.xml)

2011-10-19 Thread Gastone Penzo
ok Thanx ;)

2011/10/19 Otis Gospodnetic otis_gospodne...@yahoo.com

 Gastone,

 Those numbers are probably OK.  Let us know if you have any actual problems
 with Solr 3.4.  Oh, and use the solr-user mailing list instead please.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/

 --
 *From:* Gastone Penzo gastone.pe...@gmail.com
 *To:* solr-user@lucene.apache.org; d...@lucene.apache.org
 *Sent:* Tuesday, October 18, 2011 10:03 AM
 *Subject:* performace jetty (jetty.xml)

 Hi,
 i just change my solr installation from 1.4 to 3.4..
 i can notice that also jetty configuration file (jetty.xml) is changed.
 default threads number is higher, theadpool is higher
 and other default value are higher. is it normal??

 what number of these value do you seems are correct for me?
 i have a dedicated machine with 2 solr istances inside
 my machine has 8gb of ram and 8 cpu..

 i do like 200.000 - 250.000 calls to solr a day...

 someone can help me??

 - Theads number (min,max and low)
 - corepool size and maximum poolsize


 *
 *






-- 
*Gastone Penzo*
*
*


Re: add thumnail image for search result

2011-10-19 Thread Paul Libbrecht
Hadi,

I do not think solr or solrj does this.
are your document HTML documents? I would look in the crawler resources but I 
note that rendering is a rather server-unfriendly task and it bears some 
security risk if the documents are not fully trusted.

In i2geo.net, we finally gave up on automated rendering, we allowed the users 
to upload a snapshot; this gives thumbnails that focus on the relevant things 
instead of a global picture of the initial situation (which, with learning 
resources, is often close to a blank page).

paul


Le 19 oct. 2011 à 07:53, hadi a écrit :

 I want to know how can i add thumbnail image for my files when i am indexing
 files with solrj?
 thanks
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: score based on unique words matching???

2011-10-19 Thread Ahmet Arslan
  Heres my problem :
  
  field1 (text) - subject
  q=david bowie changes
  
  Problem : If a record mentions david bowie a lot,
 it beats out something more relevant (more unique matches)
 ...
  
  A. (now appearing david bowie at the cineplex 7pm
 david bowie goes on stage, then mr. bowie will sign
 autographs)
  B. song :david bowie - changes
  
  (A) ends up more relevant because of the frequency
 or number of words in it.. not cool...
  I want it so the number of words matching will
 trump density/weight

You need to disable to Term Frequency (tf) factor. I am not sure just omitTf is 
available but omitTermFreqAndPositions exists. If you mark your field as 
omitTermFreqAndPositions=true you will obtain what you want. But this with 
phrase queries won't work.


Re: IndexBasedSpellChecker on multiple fields

2011-10-19 Thread Simone Tripodi
Hi James!
terrific suggestion, thanks a lot!!! And sorry for the delay (due to
my timezone ;) )
I'll let you know how things will go, thanks once again and have a nice day!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote:
 Simone,

 You can set up a master dictionary but with a few caveats.  What you'll 
 need to do is copyfield all of the fields you want to include in your 
 master dictionary into one field and base your IndexBasedSpellChecker 
 dictionary on that.  In addition, I would recommend you use the collate 
 feature and set spellcheck.maxCollationTries to something greater than zero 
 (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous 
 suggestions from it trying to correct words from one field with values from 
 another.  See 
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more 
 information.

 There is still a big problem with approach, however.  Unless you set 
 onlyMorePopular=true, Solr will never suggest a correction for a word that 
 exists in the dictionary.  By creating a huge master dictionary, you will 
 be increasing the chances that Solr will assume your users' misspelled words 
 are in fact correct.  One way to work around this is instead of blindly using 
 copyField, to hand-pick a subset of your terms for the master field on 
 which you base your dictionary.  Another workaround is to use 
 onlyMorePopular, although this has its own problems.  See the discussion 
 for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims 
 to solve these problems.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of 
 Simone Tripodi
 Sent: Tuesday, October 18, 2011 7:06 AM
 To: solr-user@lucene.apache.org
 Subject: IndexBasedSpellChecker on multiple fields

 Hi all guys,
 I need to configure the IndexBasedSpellChecker that uses more than
 just one field as a spelling dictionary, is it possible to achieve?
 In the meanwhile I configured two spellcheckers and let users switch
 from a checkeer to another via params on GET request, but looks like
 people are not particularly happy about it...
 The main problem is that fields I need to speel contain different
 informations, I mean the intersection between the two sets could be
 empty.
 Many thanks in advance, all the best!
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/



Re: Dismax boost + payload boost

2011-10-19 Thread Jean-Claude Dauphin
Hello Milan,

You may also be interesting by the following article: Using Payloads with
DisMaxQParser in
SOLRhttp://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html


http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html

I have implemented something close to what is explained in this article an I
am now checking in depth if it works as I expect.

I have some problems with the bq parameter to be able to generate dynamicaly
several  bq statements with setParam(bq,arrayOfValues).

Best




Using Payloads with DisMaxQParser in SOLR

On Wed, Oct 19, 2011 at 12:02 AM, Milan Dobrota mi...@milandobrota.comwrote:

 Is it possible to combine dismax boost (query time) and payload boost
 (index
 time)?

 I've done something very similar to this post
 http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html but it seems
 that query time boosts get ignored.




-- 
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


Dismax and phrases

2011-10-19 Thread Hyttinen Lauri

Hello,

I've inherited a solr-lucene project which I continue to develop. This 
particular SOLR (1.4.1) uses dismax for the queries but I am getting 
some results that I do not understand. Mainly when I search for two 
terms I get some results however when I put quotes around the two terms 
I get a lot more results which goes against my understanding of what 
should happen ie. a lesser set of results. Where should I start digging 
for the answer? solrconfiq.xql or some other place?


Best regards,
Lauri Hyttinen


Optimization /Commit memory

2011-10-19 Thread Sujatha Arun
Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
performing Commit or Optimize or Both?

what is the requirement in terms of  size of RAM and HD for commit and
Optimize

Regards
Sujatha


Re: solr/lucene and its database (a silly question)

2011-10-19 Thread lorenlai
Hello Alireza,

thank you for the link again ;-)

Cheers

Loren

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3433803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr/lucene and its database (a silly question)

2011-10-19 Thread lorenlai
Hi Robert,

also many thanks to you and your shortly descriptions/explanations to my
questions again were really helpful.


Cheers  have a nice day

Loren 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3433811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr MultiValue Fields and adding values

2011-10-19 Thread Tiernan OToole

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
I was hoping that wasent going to be the case... I ended up querying for
all unique IDs in the DB, and then querying for each unique ID and
getting all names, and then inserting them that way... Seems a lot
slower than in theory it really should be...
 
Thanks.
 
- --Tiernan
 
On 18/10/2011 23:20, Otis Gospodnetic wrote:
 Hi,

 You'll need to construct the whole document and index it as such. You
can't append values to document fields.

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Tiernan OToole lsmart...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 18, 2011 11:41 AM
 Subject: Solr MultiValue Fields and adding values

 Good morning.

 I asked this question on StackOverflow, but though this group may be
 able to help... the question is available on SO here:
http://bit.ly/r6MAWU

 here goes:

 I am building a search engine, and have a not so unique ID for a lot of
 different names... So, for example, there could be an id of B0051QVF7A
 which would have multiple names like Kindle Amazon Kindle Amazon
 Kindle 3G Kindle Ebook Reader New Kindle etc.

 The problem, and question i have, is that i am trying to enter this data
 from a DB of 11 ish million rows. each is being read one at a time. So i
 dont have all the names of each ID. I am adding new documents to the
 list each time.

 What i am trying to find out is how do i add names to an existing
 Document? if i am reading documentation correctly, it seems to overwrite
 the whole document, not add extra info to the field... i just want to
 add an extra name to the document multivalue field...

 I know this could cause some weird and wonderful issues if a name is
 removed (in the example above, New Kindle could be removed when a
 newer Kindle gets released) but i am thinking of recreating the index
 every now and again, to clear out issues like that (once a month or so.
 Its taking about 45min currently to create the index).

 So, how do you add a value to a multivalue field in solr for an existing
 document?

 Thanks in advance.

 --Tiernan



 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g
gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H
=4exA
-END PGP SIGNATURE-



Re: Solr scraping: Nutch and other alternatives.

2011-10-19 Thread Luis Cappa Banda
Hello Marco, Markus and Óscar.

Thank you very much for your answers. What you suggest, Óscar, sounds very
interesting. I mean the alternative that covers data mining with any
'popular searcher'. Do you know any tutorial or book that can teach me the
first steps?

Bye!


Re: Find Documents with field = maxValue

2011-10-19 Thread Alireza Salimi
What I'm looking for is to do everything in single shot in Solr.
I'm not even sure if it's possible or not.
Finding the max value and then running another query is NOT my ideal
solution.

Thanks everybody


On Tue, Oct 18, 2011 at 6:28 PM, Sujit Pal sujit@comcast.net wrote:

 Hi Alireza,

 Would this work? Sort the results by age desc, then loop through the
 results as long as age == age[0].

 -sujit

 On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote:
  Hi,
 
  Are you just looking for:
 
  age:target age
 
  This will return all documents/records where age field is equal to target
 age.
 
  But maybe you want
 
  age:[0 TO target age here]
 
  This will include people aged from 0 to target age.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
  
  From: Alireza Salimi alireza.sal...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, October 18, 2011 10:15 AM
  Subject: Re: Find Documents with field = maxValue
  
  Hi Ahmet,
  
  Thanks for your reply, but I want ALL documents with age = max_age.
  
  
  On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
  
  
   --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com
 wrote:
  
From: Alireza Salimi alireza.sal...@gmail.com
Subject: Find Documents with field = maxValue
To: solr-user@lucene.apache.org
Date: Tuesday, October 18, 2011, 4:10 PM
Hi,
   
It might be a naive question.
Assume we have a list of Document, each Document contains
the information of
a person,
there is a numeric field named 'age', how can we find those
Documents whose
*age* field
is *max(age) *in one query.
  
   May be http://wiki.apache.org/solr/StatsComponent?
  
   Or sort by age?  q=*:*start=0rows=1sort=age desc
  
  
  
  
  --
  Alireza Salimi
  Java EE Developer
  
  
  




-- 
Alireza Salimi
Java EE Developer


RE: Filter Question

2011-10-19 Thread Monica Skidmore
Thanks Steven, that's just the kind of feedback I needed.  And thanks also to 
Jan.  I'll do a little clean-up on my filter and submit it...

  -Monica

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: Friday, October 14, 2011 3:18 AM
To: solr-user@lucene.apache.org
Subject: RE: Filter Question

Hi Monica,

AFAIK there is nothing like the filter you've described, and I believe it would 
be generally useful.  Maybe it could be called StopTermTypesFilter?  (Plural on 
Types to signify that more than one type of term can be stopped by a single 
instance of the filter.)  

Such a filter should have an enablePositionIncrements option like StopFilter.

Steve

 -Original Message-
 From: Monica Skidmore [mailto:monica.skidm...@careerbuilder.com]
 Sent: Thursday, October 13, 2011 1:04 PM
 To: solr-user@lucene.apache.org; Otis Gospodnetic
 Subject: RE: Filter Question
 
 Thanks, Otis - yes, this is different from the synonyms filter, which 
 we also use.  For example, if you wanted all tokens that were marked 'lemma'
 to be removed, you could specify that, and all tokens with any type 
 other than 'lemma' would still be returned.  You could also choose to 
 remove all tokens of types 'lemma' and 'word' (although that would 
 probably be a bad idea!), etc.  Normally, if you don't want a token 
 type, you just don't include/run the filter that produces that type.  
 However, we have a third-party filter that produces multiple types, 
 and this allows us to select a subset of those types.
 
 I did see the HowToContribute wiki, but I'm relatively new to solr, 
 and I wanted to see if this looked familiar to someone before I 
 started down the contribution path.
 
 Thanks again!
 
   -Monica
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Thursday, October 13, 2011 12:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Filter Question
 
 Monica,
 
 This is different from Solr's synonyms filter with different synonyms 
 files, one for index-time and the other for query-time expansion (not 
 sure when you'd want that, but it looks like you need this and like 
 this), right?  If so, maybe you can describe what your filter does 
 differently and then follow 
 http://wiki.apache.org/solr/HowToContribute - thanks in advance! :)
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene 
 ecosystem search :: http://search-lucene.com/
 
 
 
 From: Monica Skidmore monica.skidm...@careerbuilder.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, October 13, 2011 11:37 AM
 Subject: Filter Question
 
 Our Solr implementation includes a third-party filter that adds
 additional, multiple term types to the token list (beyond word, 
 etc.).  Most of the time this is exactly what we want, but we felt we 
 could improve our search results by having different tokens on the 
 index and query side.  Since the filter in question was third-party 
 and we didn't have access to source code, we wrote our own filter that 
 will take out tokens based on their term attribute type.
 
 We didn't see another filter available that does this - did we 
 overlook
 it?  And if not, is this something that would be of value if we 
 contribute it back to the Solr community?
 
 Monica Skidmore
 
 
 
 


Painfully slow indexing

2011-10-19 Thread Pranav Prakash
Hi guys,

I have set up a Solr instance and upon attempting to index document, the
whole process is painfully slow. I will try to put as much info as I can in
this mail. Pl. feel free to ask me anything else that might be required.

I am sending documents in batches not exceeding 2,000. The size of each of
them depends but usually is around 10-15MiB. My indexing script tells me
that Solr took T seconds to add N documents of size S. For the same data,
the Solr Log add QTime is QT. Some of the sample data are:

   N ST   QT
-
 390 docs  |   3,478,804 Bytes   | 14.5s|  2297
 852 docs  |   6,039,535 Bytes   | 25.3s|  4237
1345 docs | 11,147,512 Bytes   |  47s  |  8543
1147 docs |   9,457,717 Bytes   |  44s  |  2297
1096 docs | 13,058,204 Bytes   |  54.3s   |   8782

The time T includes the time of converting an array of Hash objects into
XML, POSTing it to Solr and response acknowledged from Solr. Clearly, there
is a huge difference between both the time T and QT. After a lot of efforts,
I have no clue why these times do not match.

The Server has 16 cores, 48GiB RAM. JVM options are -Xms5000M -Xmx5000M
-XX:+UseParNewGC

I believe my Indexing is getting slow. Relevant portion from my schema file
are as follows. On a related note, every document has one dynamic field.
Based on this rate, it takes me ~30hrs to do a full index of my database.
I would really appreciate kindness of community in order to get this
indexing faster.

indexDefaults

useCompoundFilefalse/useCompoundFile

mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler

int name=maxMergeCount10/int

int name=maxThreadCount10/int

 /mergeScheduler

ramBufferSizeMB2048/ramBufferSizeMB

maxMergeDocs2147483647/maxMergeDocs

maxFieldLength300/maxFieldLength

writeLockTimeout1000/writeLockTimeout

maxBufferedDocs5/maxBufferedDocs

termIndexInterval256/termIndexInterval

mergeFactor10/mergeFactor

useCompoundFilefalse/useCompoundFile

!-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy

 int name=maxMergeAtOnceExplicit19/int

int name=segmentsPerTier9/int

/mergePolicy --

/indexDefaults

mainIndex

unlockOnStartuptrue/unlockOnStartup

reopenReaderstrue/reopenReaders

deletionPolicy class=solr.SolrDeletionPolicy

 str name=maxCommitsToKeep1/str

str name=maxOptimizedCommitsToKeep0/str

/deletionPolicy

infoStream file=INFOSTREAM.txtfalse/infoStream

/mainIndex

updateHandler class=solr.DirectUpdateHandler2 

autoCommit

 maxDocs10/maxDocs

/autoCommit

/updateHandler


*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


Merging Remote Solr Indexes?

2011-10-19 Thread darren

Hi,
  I thought of a useful capability if it doesn't already exist.

Is it possible to do an index merge between two remote Solr's?

To handle massive index-time scalability, wouldn't it be useful
to have distributed indexes accepting local input, then merge
them into one central index after?

Darren


RE: Solr MultiValue Fields and adding values

2011-10-19 Thread Dyer, James
While Solr/Lucene can't support true document updates, there are 2 ways you 
might be able to work around this in your situation.

1. If you store all of the fields, you can write something that will read back 
everything already indexed to the document, append whatever data you want, then 
write it back.  This will increase index size and possibly make indexing too 
slow.  On the other hand, it might be more efficient than requiring the 
database to return everything in order.

2. You could store your data as multiple documents per id (pick something else 
as your unique id).  Then use the grouping functionality to roll up on your 
unique id whenever you query.  This will mean changes to your application, 
probably a bigger index, and likely somewhat slower querying.  But the 
performance losses might be slight and this seems to me like it maybe would be 
a good solution in your case.  Perhaps it would make it so you wouldn't have to 
entirely re-index each month or so.  See 
http://wiki.apache.org/solr/FieldCollapsing for more information.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Tiernan OToole [mailto:lsmart...@gmail.com] 
Sent: Wednesday, October 19, 2011 5:11 AM
To: solr-user@lucene.apache.org
Cc: Otis Gospodnetic
Subject: Re: Solr MultiValue Fields and adding values


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
I was hoping that wasent going to be the case... I ended up querying for
all unique IDs in the DB, and then querying for each unique ID and
getting all names, and then inserting them that way... Seems a lot
slower than in theory it really should be...
 
Thanks.
 
- --Tiernan
 
On 18/10/2011 23:20, Otis Gospodnetic wrote:
 Hi,

 You'll need to construct the whole document and index it as such. You
can't append values to document fields.

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Tiernan OToole lsmart...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 18, 2011 11:41 AM
 Subject: Solr MultiValue Fields and adding values

 Good morning.

 I asked this question on StackOverflow, but though this group may be
 able to help... the question is available on SO here:
http://bit.ly/r6MAWU

 here goes:

 I am building a search engine, and have a not so unique ID for a lot of
 different names... So, for example, there could be an id of B0051QVF7A
 which would have multiple names like Kindle Amazon Kindle Amazon
 Kindle 3G Kindle Ebook Reader New Kindle etc.

 The problem, and question i have, is that i am trying to enter this data
 from a DB of 11 ish million rows. each is being read one at a time. So i
 dont have all the names of each ID. I am adding new documents to the
 list each time.

 What i am trying to find out is how do i add names to an existing
 Document? if i am reading documentation correctly, it seems to overwrite
 the whole document, not add extra info to the field... i just want to
 add an extra name to the document multivalue field...

 I know this could cause some weird and wonderful issues if a name is
 removed (in the example above, New Kindle could be removed when a
 newer Kindle gets released) but i am thinking of recreating the index
 every now and again, to clear out issues like that (once a month or so.
 Its taking about 45min currently to create the index).

 So, how do you add a value to a multivalue field in solr for an existing
 document?

 Thanks in advance.

 --Tiernan



 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g
gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H
=4exA
-END PGP SIGNATURE-



Re: Solr scraping: Nutch and other alternatives.

2011-10-19 Thread Igor MILOVANOVIC
Try this if you haven't use python before :
http://gun.io/blog/python-for-the-web/

Keep in mind that the usage of some very known search engine is usually
not in line with their ToS, so they will sooner or later block you, at
least.

Be gentle and polite, and you even might make it work... ;)


On Wed, Oct 19, 2011 at 2:08 PM, Luis Cappa Banda luisca...@gmail.comwrote:

 Do you know any tutorial or book that can teach me the
 first steps?




-- 
Igor Milovanović
http://about.me/igor.milovanovic
http://umotvorine.com/


Re: Solr MultiValue Fields and adding values

2011-10-19 Thread Tiernan OToole

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Thanks for the comment. Sounds like too much of a change in all
fairness... I have actually made a tweak to my DB to allow multiple
names, and storing them off the main table. my query then only needs to
query the IDs, and then the second table to get the names. but i will
keep the comments in mind and see how things go over the next while.
 
As a side note, if i were to go down the get doc from solr, modify,
commit back to solr is it really that simple? run a query on Solr, get
the document, add the extra data, and insert back to solr?
 
Thanks.
 
- --Tiernan
 
On 19/10/2011 15:26, Dyer, James wrote:
 While Solr/Lucene can't support true document updates, there are 2 ways
you might be able to work around this in your situation.

 1. If you store all of the fields, you can write something that will
read back everything already indexed to the document, append whatever
data you want, then write it back. This will increase index size and
possibly make indexing too slow. On the other hand, it might be more
efficient than requiring the database to return everything in order.

 2. You could store your data as multiple documents per id (pick
something else as your unique id). Then use the grouping functionality
to roll up on your unique id whenever you query. This will mean changes
to your application, probably a bigger index, and likely somewhat slower
querying. But the performance losses might be slight and this seems to
me like it maybe would be a good solution in your case. Perhaps it would
make it so you wouldn't have to entirely re-index each month or so. See
http://wiki.apache.org/solr/FieldCollapsing for more information.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Tiernan OToole [mailto:lsmart...@gmail.com]
 Sent: Wednesday, October 19, 2011 5:11 AM
 To: solr-user@lucene.apache.org
 Cc: Otis Gospodnetic
 Subject: Re: Solr MultiValue Fields and adding values


 I was hoping that wasent going to be the case... I ended up querying for
 all unique IDs in the DB, and then querying for each unique ID and
 getting all names, and then inserting them that way... Seems a lot
 slower than in theory it really should be...

 Thanks.

 --Tiernan

 On 18/10/2011 23:20, Otis Gospodnetic wrote:
  Hi,

  You'll need to construct the whole document and index it as such. You
 can't append values to document fields.

  Otis
  

  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/


  
  From: Tiernan OToole lsmart...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, October 18, 2011 11:41 AM
  Subject: Solr MultiValue Fields and adding values
 
  Good morning.
 
  I asked this question on StackOverflow, but though this group may be
  able to help... the question is available on SO here:
 http://bit.ly/r6MAWU
 
  here goes:
 
  I am building a search engine, and have a not so unique ID for a lot of
  different names... So, for example, there could be an id of B0051QVF7A
  which would have multiple names like Kindle Amazon Kindle Amazon
  Kindle 3G Kindle Ebook Reader New Kindle etc.
 
  The problem, and question i have, is that i am trying to enter this
data
  from a DB of 11 ish million rows. each is being read one at a time.
So i
  dont have all the names of each ID. I am adding new documents to the
  list each time.
 
  What i am trying to find out is how do i add names to an existing
  Document? if i am reading documentation correctly, it seems to
overwrite
  the whole document, not add extra info to the field... i just want to
  add an extra name to the document multivalue field...
 
  I know this could cause some weird and wonderful issues if a name is
  removed (in the example above, New Kindle could be removed when a
  newer Kindle gets released) but i am thinking of recreating the index
  every now and again, to clear out issues like that (once a month or so.
  Its taking about 45min currently to create the index).
 
  So, how do you add a value to a multivalue field in solr for an
existing
  document?
 
  Thanks in advance.
 
  --Tiernan
 
 
 


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6e5j4ACgkQW5AKVqf62MHcnACbBGtTs25FjGe8Rs7q9DyO0J5r
VnEAnRiPe4KCe717i//aPFiAlYsLwELB
=eqRg
-END PGP SIGNATURE-



Re: Solr MultiValue Fields and adding values

2011-10-19 Thread Tiernan OToole

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Thats what i though too... we see what the speed difference actually
is... running some tests now...
 
Thanks for the info!
 
- --Tiernan
 
On 19/10/2011 16:07, Dyer, James wrote:

 Not that I am doing this with any of my indexes, but I'm pretty sure
the get doc from solr, modify, commit back to solr approach really is
that simple. Just be sure you are storing the exact raw data that came
from your database (typically you would). The problem with this approach
is it potentially could be very slow if you're updating lots of documents.



 James Dyer

 E-Commerce Systems

 Ingram Content Group

 (615) 213-4311



 From: Tiernan OToole [mailto:lsmart...@gmail.com]
 Sent: Wednesday, October 19, 2011 10:01 AM
 To: solr-user@lucene.apache.org
 Cc: Dyer, James
 Subject: Re: Solr MultiValue Fields and adding values




 Thanks for the comment. Sounds like too much of a change in all
fairness... I have actually made a tweak to my DB to allow multiple
names, and storing them off the main table. my query then only needs to
query the IDs, and then the second table to get the names. but i will
keep the comments in mind and see how things go over the next while.

 As a side note, if i were to go down the get doc from solr, modify,
commit back to solr is it really that simple? run a query on Solr, get
the document, add the extra data, and insert back to solr?

 Thanks.

 --Tiernan

 On 19/10/2011 15:26, Dyer, James wrote:
  While Solr/Lucene can't support

 true document updates, there are 2 ways you might be able to work

 around this in your situation.







  1. If you store all of the fields, you can write something

 that will read back everything already indexed to the document,

 append whatever data you want, then write it back. This will

 increase index size and possibly make indexing too slow. On the

 other hand, it might be more efficient than requiring the database

 to return everything in order.







  2. You could store your data as multiple documents per id

 (pick something else as your unique id). Then use the grouping

 functionality to roll up on your unique id whenever you query.

 This will mean changes to your application, probably a bigger

 index, and likely somewhat slower querying. But the performance

 losses might be slight and this seems to me like it maybe would be

 a good solution in your case. Perhaps it would make it so you

 wouldn't have to entirely re-index each month or so. See

 http://wiki.apache.org/solr/FieldCollapsing for more information.







  James Dyer



  E-Commerce Systems



  Ingram Content Group



  (615) 213-4311











  -Original Message-



  From: Tiernan OToole [mailto:lsmart...@gmail.com]



  Sent: Wednesday, October 19, 2011 5:11 AM



  To: solr-user@lucene.apache.org



  Cc: Otis Gospodnetic



  Subject: Re: Solr MultiValue Fields and adding values











  I was hoping that wasent going to be the case... I ended up

 querying for



  all unique IDs in the DB, and then querying for each unique

 ID and



  getting all names, and then inserting them that way... Seems

 a lot



  slower than in theory it really should be...







  Thanks.







  --Tiernan







  On 18/10/2011 23:20, Otis Gospodnetic wrote:



   Hi,







   You'll need to construct the whole document and index it

 as such. You



  can't append values to document fields.







   Otis



   







   Sematext :: http://sematext.com/ :: Solr - Lucene -

 Nutch



   Lucene ecosystem search :: http://search-lucene.com/











   



   From: Tiernan OToole lsmart...@gmail.com



   To: solr-user@lucene.apache.org



   Sent: Tuesday, October 18, 2011 11:41 AM



   Subject: Solr MultiValue Fields and adding values



  



   Good morning.



  



   I asked this question on StackOverflow, but though

 this group may be



   able to help... the question is available on SO

 here:



  http://bit.ly/r6MAWU



  



   here goes:



  



   I am building a search engine, and have a not so

 unique ID for a lot of



   different names... So, for example, there could be

 an id of B0051QVF7A



   which would have multiple names like Kindle

 Amazon Kindle Amazon



   Kindle 3G Kindle Ebook Reader New Kindle etc.



  



   The problem, and question i have, is that i am

 trying to enter this data



   from a DB of 11 ish million rows. each is being read

 one at a time. So i



   dont have all the names of each ID. I am adding new

 documents to the



   list each time.



  



   What i am trying to find out is how do i add names

 to an existing



   Document? if i am reading documentation correctly,

 it seems to overwrite



   the whole document, not add extra info to the

 field... i just want to



   add an extra name to the document multivalue

 field...



  



   I know this could cause some weird and wonderful

 issues if a name is



   removed 

stemEnglishPossessive and contractions

2011-10-19 Thread Herman Kiefus
We utilize a comprehensive dictionary of English words, place names, surnames, 
male and female first names, ... you get the point.  As such, the possessive 
plural forms of these words are recognized as 'misspelled'.

I simply thought that 'turning on' this option for the WordDelimiterFactory 
would address my concerns; however, I also got an unintended consequence: 
Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be 
affected.  Is this intended behavior?  When I read 'English possessive' I hear 
'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing 
here?


Re: Merging Remote Solr Indexes?

2011-10-19 Thread Otis Gospodnetic
Hi Darren,

http://search-lucene.com/?q=solr+mergefc_project=Solr


Check hit #1

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: dar...@ontrenet.com dar...@ontrenet.com
To: solr-user@lucene.apache.org
Sent: Wednesday, October 19, 2011 10:04 AM
Subject: Merging Remote Solr Indexes?


Hi,
  I thought of a useful capability if it doesn't already exist.

Is it possible to do an index merge between two remote Solr's?

To handle massive index-time scalability, wouldn't it be useful
to have distributed indexes accepting local input, then merge
them into one central index after?

Darren




java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-19 Thread Tod
I'm working on upgrading to Solr 3.4.0 and am seeing this error in my 
tomcat log.  I'm using the following slf jars:


slf4j-api-1.6.1.jar
slf4j-jdk14-1.6.1.jar

Has anybody run into this?  I can reproduce it doing curl calls to the 
Solr ExtractingRequestHandler ala /solr/update/extract.


TIA - Tod


Re: stemEnglishPossessive and contractions

2011-10-19 Thread Robert Muir
The word delimiter filter also does other things, it treats ' as
punctuation by default. So it normally splits on ', except if its 's
(in this case it removes the 's completely if you use this
stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling
splitting on ' by customize its type table. in this case specify
types=mycustomtypes.txt, and in that file specify ' to be treated as
ALPHANUM or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of
this. i would only do this if you want worddelimiterfilter for other
purposes, if you just want to remove possessives and don't need
worddelimiterfilter's other features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does
this exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus herm...@angieslist.com wrote:
 We utilize a comprehensive dictionary of English words, place names, 
 surnames, male and female first names, ... you get the point.  As such, the 
 possessive plural forms of these words are recognized as 'misspelled'.

 I simply thought that 'turning on' this option for the WordDelimiterFactory 
 would address my concerns; however, I also got an unintended consequence: 
 Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be 
 affected.  Is this intended behavior?  When I read 'English possessive' I 
 hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm 
 missing here?




-- 
lucidimagination.com


How to make UnInvertedField faster?

2011-10-19 Thread Michael Ryan
I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
faster, or other alternatives for generating facets quickly.

The vast majority of the CPU time for our Solr instances is spent generating
UnInvertedFields after each commit. Here's an example of one of our slower 
fields:

[2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
UnInverted multi-valued field 
{field=authorCS,memSize=38063628,tindexSize=422652,
time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

That is from an index with approximately 8 million documents. After each commit,
it takes on average about 90 seconds to uninvert all the fields that we facet 
on.

Any ideas at all would be greatly appreciated.

-Michael


dataimport indexing fails: where are my log files ? ;-)

2011-10-19 Thread Fred Zimmerman
dumb question ...

today I set up solr3.4/example, indexing to 8983 via post is working, so is
search, solr/dataimport reports

str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-10-19 18:13:57/str
str name=Indexing failed. Rolled back all changes./str

Google tells me to look at the exception logs to find out what's happening
... but, I can't find the logs!   Where are they? example/logs is an empty
directory.


Re: dataimport indexing fails: where are my log files ? ;-)

2011-10-19 Thread Shawn Heisey

On 10/19/2011 12:42 PM, Fred Zimmerman wrote:

dumb question ...

today I set up solr3.4/example, indexing to 8983 via post is working, so is
search, solr/dataimport reports

str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-10-19 18:13:57/str
str name=Indexing failed. Rolled back all changes./str

Google tells me to look at the exception logs to find out what's happening
... but, I can't find the logs!   Where are they? example/logs is an empty
directory.


I believe that if you are running the example Solr without any changes 
related to logging, that information will be dumped to stdout/stderr.  
If you are starting Solr as a daemon or a service, it may be going 
someplace you can't retrieve it.  Start it directly from the commandline 
and/or alter your startup command to redirect stdout/stderr to files.


I hope that's actually helpful!

Thanks,
Shawn



Re: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-19 Thread Tim Terlegård
Hi Tod,

I had similar issue with slf4j, but it was NoClassDefFound. Do you
have some other dependencies in your application that use some other
version of slf4j? You can use mvn dependency:tree to get all
dependencies in your application. Or maybe there's some other version
already in your tomcat or application server.

/Tim

2011/10/19 Tod listac...@gmail.com:
 I'm working on upgrading to Solr 3.4.0 and am seeing this error in my tomcat
 log.  I'm using the following slf jars:

 slf4j-api-1.6.1.jar
 slf4j-jdk14-1.6.1.jar

 Has anybody run into this?  I can reproduce it doing curl calls to the Solr
 ExtractingRequestHandler ala /solr/update/extract.

 TIA - Tod



RE: stemEnglishPossessive and contractions

2011-10-19 Thread Herman Kiefus
Thanks Robert, exactly what I was looking for.

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Wednesday, October 19, 2011 1:15 PM
To: solr-user@lucene.apache.org
Subject: Re: stemEnglishPossessive and contractions

The word delimiter filter also does other things, it treats ' as punctuation by 
default. So it normally splits on ', except if its 's (in this case it removes 
the 's completely if you use this stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling 
splitting on ' by customize its type table. in this case specify 
types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM 
or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i 
would only do this if you want worddelimiterfilter for other purposes, if you 
just want to remove possessives and don't need worddelimiterfilter's other 
features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does this 
exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus herm...@angieslist.com wrote:
 We utilize a comprehensive dictionary of English words, place names, 
 surnames, male and female first names, ... you get the point.  As such, the 
 possessive plural forms of these words are recognized as 'misspelled'.

 I simply thought that 'turning on' this option for the WordDelimiterFactory 
 would address my concerns; however, I also got an unintended consequence: 
 Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be 
 affected.  Is this intended behavior?  When I read 'English possessive' I 
 hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm 
 missing here?




--
lucidimagination.com


where is solr data import handler looking for my file?

2011-10-19 Thread Fred Zimmerman
Solr dataimport is reporting file not found when it looks for foo.xml.

Where is it looking for /data? is this an url off the apache2/htdocs on the
server, or is it an URL within example/solr/...?


 entity name=page
processor=XPathEntityProcessor
stream=true
forEach=/mediawiki/page/
url=/data/foo.xml
transformer=RegexTransformer,DateFormatTransformer



Re: Merging Remote Solr Indexes?

2011-10-19 Thread Darren Govoni

Hi Otis,
   Yeah, I saw page, but it says for merging cores, which I presume 
must reside locally to the solr instance doing the merging?
What I'm interested in doing is merging across solr instances running on 
different machines into a single solr running on
another machine (programmatically). Is it still possible or did I 
misread the wiki?


Thanks!
Darren

On 10/19/2011 11:57 AM, Otis Gospodnetic wrote:

Hi Darren,

http://search-lucene.com/?q=solr+mergefc_project=Solr


Check hit #1

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: dar...@ontrenet.comdar...@ontrenet.com
To: solr-user@lucene.apache.org
Sent: Wednesday, October 19, 2011 10:04 AM
Subject: Merging Remote Solr Indexes?


Hi,
   I thought of a useful capability if it doesn't already exist.

Is it possible to do an index merge between two remote Solr's?

To handle massive index-time scalability, wouldn't it be useful
to have distributed indexes accepting local input, then merge
them into one central index after?

Darren







RE: how was developed solr admin page and the UI part?

2011-10-19 Thread Jaeger, Jay - DOT
I believe that if you have the Solr distribution, you have the source for the 
web UI already: it is just .jsp pages.  They are inside the solr .war file.

JRJ

-Original Message-
From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] 
Sent: Wednesday, October 19, 2011 12:07 AM
To: solr-user@lucene.apache.org
Subject: how was developed solr admin page and the UI part?

Hi everybody...
  
  i would like know how was the solr admin page and the total UI part
developed i would like to download the source code of solr UI part can
anybody send me the links please 



 Thanx in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-was-developed-solr-admin-page-and-the-UI-part-tp3433345p3433345.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: OS Cache - Solr

2011-10-19 Thread Jaeger, Jay - DOT
200 instances of what?  The Solr application with lucene, etc. per usual?  Solr 
cores? ???

Either way, 200 seems to be very very very many: unusually so.  Why so many?

If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per 
Solr instance.  

If you have 200 instances of Solr all accessing the same physical disk, the 
results are not likely to be satisfactory - the disk head will go nuts trying 
to handle all of the requests.

JRJ

-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com] 
Sent: Wednesday, October 19, 2011 12:25 AM
To: solr-user@lucene.apache.org; Otis Gospodnetic
Subject: Re: OS Cache - Solr

Thanks ,Otis,

This is our Solr Cache  Allocation.We have the same Cache allocation for all
our *200+ instances* in the single Server.Is this too high?

*Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=1024, )

*Document Cache *:LRU Cache(maxSize=16384, initialSize=16384)


*Filter Cache* LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=4096, )

Regards
Sujatha

On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Maybe your Solr Document cache is big and that's consuming a big part of
 that JVM heap?
 If you want to be able to run with a smaller heap, consider making your
 caches smaller.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Sujatha Arun suja.a...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 18, 2011 12:53 AM
 Subject: Re: OS Cache - Solr
 
 Hello Jan,
 
 Thanks for your response and  clarification.
 
 We are monitoring the JVM cache utilization and we are currently using
 about
 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB
 
 Regards
 Sujatha
 
 On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com
 wrote:
 
  Hi Sujatha,
 
  Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole
 or
  similar? Try with 15Gb and see how it goes. The reason why this is
  beneficial is that you WANT your OS to have available memory for disk
  caching. If you have 17Gb free after starting Solr, your OS will be able
 to
  cache all index files in memory and you get very high search
 performance.
  With your current settings, there is only 12Gb free for both caching the
  index and for your MySql activities.  Chances are that when you backup
  MySql, the cached part of your Solr index gets flushed from disk caches
 and
  need to be re-cached later.
 
  How to interpret memory stats vary between OSes, and seing 163Mb free
 may
  simply mean that your OS has used most RAM for various caches and
 paging,
  but will flush it once an application asks for more memory. Have you
 seen
  http://wiki.apache.org/solr/SolrPerformanceFactors ?
 
  You should also slim down your index maximally by setting stored=false
 and
  indexed=false wherever possible. I would also upgrade to a more current
 Solr
  version.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  On 17. okt. 2011, at 19:51, Sujatha Arun wrote:
 
   Hello
  
   I am trying to understand the  OS cache utilization of Solr .Our
 server
  has
   several solr instances on a server .The total combined Index size of
 all
   instances is abt 14 Gb and the size of the maximum single Index is abt
  2.5
   GB .
  
   Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has
 been
   assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
  
   Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
   assigned to JVM .However the  Free physical seems to remain constant
 as
   below.
   Free physical memory = 163 Mb
   Total physical memory = 32,232 Mb,
  
   The server also serves as a backup server for Mysql where the
 application
  DB
   is backed up and restored .During this activity we see that lot of
  queries
   that nearly take even 10+ minutes to execute .But other wise
   maximum query time is less than  1-2 secs
  
   The physical memory that is free seems to be constant . Why is this
  constant
   and how this will be used between the  Mysql backup and solr while
   backup activity is  happening How much free physical memory should be
   available to OS given out stats.?
  
   Any pointers would be helpful.
  
   Regards
   Sujatha
 
 
 
 
 



RE: How to update document with solrj?

2011-10-19 Thread Jaeger, Jay - DOT
Solr does not have an update per se:  you have to re-add the document.  A 
document with the same value for the field defined as the uniqueKey will 
replace any existing document with that key (you do not have to query and 
explicitly delete it first).

JRJ

-Original Message-
From: hadi [mailto:md.anb...@gmail.com] 
Sent: Wednesday, October 19, 2011 12:50 AM
To: solr-user@lucene.apache.org
Subject: How to update document with solrj?

I have indexed some files that do not have any tag or description and i want
to add some field without deleting them,how can i update or add info to my
index files with solrj?
my idea for this issue is query on specific file and delete it and add some
info and re index it but i think it is not a good idea


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-update-document-with-solrj-tp3433434p3433434.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: add thumnail image for search result

2011-10-19 Thread Jaeger, Jay - DOT
It won't do it for you automatically.  I suppose you might create the thumbnail 
image beforehand, Base64 encode it, and add it as a stored, non-indexed, binary 
field (see schema: solr.BinaryField) when you index the document.

JRJ

-Original Message-
From: hadi [mailto:md.anb...@gmail.com] 
Sent: Wednesday, October 19, 2011 12:54 AM
To: solr-user@lucene.apache.org
Subject: add thumnail image for search result

I want to know how can i add thumbnail image for my files when i am indexing
files with solrj?
thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Optimization /Commit memory

2011-10-19 Thread Jaeger, Jay - DOT
Commit does not particularly spike disk or memory usage, unless you are adding 
a very large number of documents between commits.  A commit can cause a need to 
merge indexes, which can increase disk space temporarily.  An optimize is 
*likely* to merge indexes, which will usually increase disk space temporarily.

How much disk space depends very much upon how big your index is in the first 
place.  A 2 to 3 times factor of the sum of your peak index file size seems 
safe, to me.

Solr uses only modest amounts of memory for the JVM for this stuff. 

JRJ

-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com] 
Sent: Wednesday, October 19, 2011 4:04 AM
To: solr-user@lucene.apache.org
Subject: Optimization /Commit memory

Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
performing Commit or Optimize or Both?

what is the requirement in terms of  size of RAM and HD for commit and
Optimize

Regards
Sujatha


Re: Merging Remote Solr Indexes?

2011-10-19 Thread Otis Gospodnetic
Darren,

No, that is not possible without one copying an index/shard to a single machine 
on which you would then merge indices as described on the Wiki.

H, wouldn't it be nice to make use of existing replication code to make it 
possible to move shards around the cluster?

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Darren Govoni dar...@ontrenet.com
To: solr-user@lucene.apache.org
Sent: Wednesday, October 19, 2011 5:15 PM
Subject: Re: Merging Remote Solr Indexes?

Hi Otis,
    Yeah, I saw page, but it says for merging cores, which I presume 
must reside locally to the solr instance doing the merging?
What I'm interested in doing is merging across solr instances running on 
different machines into a single solr running on
another machine (programmatically). Is it still possible or did I 
misread the wiki?

Thanks!
Darren

On 10/19/2011 11:57 AM, Otis Gospodnetic wrote:
 Hi Darren,

 http://search-lucene.com/?q=solr+mergefc_project=Solr


 Check hit #1

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: dar...@ontrenet.comdar...@ontrenet.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, October 19, 2011 10:04 AM
 Subject: Merging Remote Solr Indexes?


 Hi,
    I thought of a useful capability if it doesn't already exist.

 Is it possible to do an index merge between two remote Solr's?

 To handle massive index-time scalability, wouldn't it be useful
 to have distributed indexes accepting local input, then merge
 them into one central index after?

 Darren








Re: Merging Remote Solr Indexes?

2011-10-19 Thread Darren Govoni
Actually, yeah. If you think about it a remote merge is like the inverse 
of replication.
Where replication is a one to many away from an index, the inverse would 
be merging many back to the one.

Sorta like a recall.

I think it would be a great analog to replication.

On 10/19/2011 06:18 PM, Otis Gospodnetic wrote:

Darren,

No, that is not possible without one copying an index/shard to a single machine 
on which you would then merge indices as described on the Wiki.

H, wouldn't it be nice to make use of existing replication code to make it 
possible to move shards around the cluster?

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: Darren Govonidar...@ontrenet.com
To: solr-user@lucene.apache.org
Sent: Wednesday, October 19, 2011 5:15 PM
Subject: Re: Merging Remote Solr Indexes?

Hi Otis,
 Yeah, I saw page, but it says for merging cores, which I presume
must reside locally to the solr instance doing the merging?
What I'm interested in doing is merging across solr instances running on
different machines into a single solr running on
another machine (programmatically). Is it still possible or did I
misread the wiki?

Thanks!
Darren

On 10/19/2011 11:57 AM, Otis Gospodnetic wrote:

Hi Darren,

http://search-lucene.com/?q=solr+mergefc_project=Solr


Check hit #1

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: dar...@ontrenet.comdar...@ontrenet.com
To: solr-user@lucene.apache.org
Sent: Wednesday, October 19, 2011 10:04 AM
Subject: Merging Remote Solr Indexes?


Hi,
 I thought of a useful capability if it doesn't already exist.

Is it possible to do an index merge between two remote Solr's?

To handle massive index-time scalability, wouldn't it be useful
to have distributed indexes accepting local input, then merge
them into one central index after?

Darren











RE: how was developed solr admin page and the UI part?

2011-10-19 Thread nagarjuna
Thank u for ur reply jaegeri saw that and i would like to use
that jsp code and thought to modify solr UI little bit as per user
convinience .now my question is ,is it possible to develop that using
spring mvc architecture.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-was-developed-solr-admin-page-and-the-UI-part-tp3433345p3436737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimization /Commit memory

2011-10-19 Thread Sujatha Arun
Thanks  Jay ,

I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14 GB
Index [cumulative Index size of all Instances].And I put it thus -

Requirement of Operating System RAM for an Index of  14GB is   - Index Size
+ 3 Times the  maximum Index Size of Individual Instance for Optimize .

That is to say ,I have several Instances ,combined Index Size is 14GB
.Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
 14GB +3 * 2.5 GB  ~ = 22GB.

Correct?

Regards
Sujatha



On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 Commit does not particularly spike disk or memory usage, unless you are
 adding a very large number of documents between commits.  A commit can cause
 a need to merge indexes, which can increase disk space temporarily.  An
 optimize is *likely* to merge indexes, which will usually increase disk
 space temporarily.

 How much disk space depends very much upon how big your index is in the
 first place.  A 2 to 3 times factor of the sum of your peak index file size
 seems safe, to me.

 Solr uses only modest amounts of memory for the JVM for this stuff.

 JRJ

 -Original Message-
 From: Sujatha Arun [mailto:suja.a...@gmail.com]
 Sent: Wednesday, October 19, 2011 4:04 AM
 To: solr-user@lucene.apache.org
 Subject: Optimization /Commit memory

 Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
 performing Commit or Optimize or Both?

 what is the requirement in terms of  size of RAM and HD for commit and
 Optimize

 Regards
 Sujatha



Re: OS Cache - Solr

2011-10-19 Thread Sujatha Arun
Yes 200 Individual Solr Instances not solr cores.

We get an avg response time of below 1 sec.

The number of documents is  not many most of the isntances ,some of the
instnaces have about 5 lac documents on average.

Regards
Sujahta

On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 200 instances of what?  The Solr application with lucene, etc. per usual?
  Solr cores? ???

 Either way, 200 seems to be very very very many: unusually so.  Why so
 many?

 If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB
 per Solr instance.

 If you have 200 instances of Solr all accessing the same physical disk, the
 results are not likely to be satisfactory - the disk head will go nuts
 trying to handle all of the requests.

 JRJ

 -Original Message-
 From: Sujatha Arun [mailto:suja.a...@gmail.com]
 Sent: Wednesday, October 19, 2011 12:25 AM
 To: solr-user@lucene.apache.org; Otis Gospodnetic
 Subject: Re: OS Cache - Solr

 Thanks ,Otis,

 This is our Solr Cache  Allocation.We have the same Cache allocation for
 all
 our *200+ instances* in the single Server.Is this too high?

 *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096,
 autowarmCount=1024, )

 *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384)


 *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096,
 autowarmCount=4096, )

 Regards
 Sujatha

 On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  Maybe your Solr Document cache is big and that's consuming a big part of
  that JVM heap?
  If you want to be able to run with a smaller heap, consider making your
  caches smaller.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
  
  From: Sujatha Arun suja.a...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, October 18, 2011 12:53 AM
  Subject: Re: OS Cache - Solr
  
  Hello Jan,
  
  Thanks for your response and  clarification.
  
  We are monitoring the JVM cache utilization and we are currently using
  about
  18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB
  
  Regards
  Sujatha
  
  On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com
  wrote:
  
   Hi Sujatha,
  
   Are you sure you need 20Gb for Tomcat? Have you profiled using
 JConsole
  or
   similar? Try with 15Gb and see how it goes. The reason why this is
   beneficial is that you WANT your OS to have available memory for disk
   caching. If you have 17Gb free after starting Solr, your OS will be
 able
  to
   cache all index files in memory and you get very high search
  performance.
   With your current settings, there is only 12Gb free for both caching
 the
   index and for your MySql activities.  Chances are that when you backup
   MySql, the cached part of your Solr index gets flushed from disk
 caches
  and
   need to be re-cached later.
  
   How to interpret memory stats vary between OSes, and seing 163Mb free
  may
   simply mean that your OS has used most RAM for various caches and
  paging,
   but will flush it once an application asks for more memory. Have you
  seen
   http://wiki.apache.org/solr/SolrPerformanceFactors ?
  
   You should also slim down your index maximally by setting stored=false
  and
   indexed=false wherever possible. I would also upgrade to a more
 current
  Solr
   version.
  
   --
   Jan Høydahl, search solution architect
   Cominvent AS - www.cominvent.com
   Solr Training - www.solrtraining.com
  
   On 17. okt. 2011, at 19:51, Sujatha Arun wrote:
  
Hello
   
I am trying to understand the  OS cache utilization of Solr .Our
  server
   has
several solr instances on a server .The total combined Index size of
  all
instances is abt 14 Gb and the size of the maximum single Index is
 abt
   2.5
GB .
   
Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has
  been
assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
   
Our current Statistics indicate that  solr uses 18-19 GB of 20 GB
 RAM
assigned to JVM .However the  Free physical seems to remain constant
  as
below.
Free physical memory = 163 Mb
Total physical memory = 32,232 Mb,
   
The server also serves as a backup server for Mysql where the
  application
   DB
is backed up and restored .During this activity we see that lot of
   queries
that nearly take even 10+ minutes to execute .But other wise
maximum query time is less than  1-2 secs
   
The physical memory that is free seems to be constant . Why is this
   constant
and how this will be used between the  Mysql backup and solr while
backup activity is  happening How much free physical memory should
 be
available to OS given out stats.?
   
Any pointers would be helpful.
   
Regards
Sujatha