Indexing multiple documents in Solr/SolrCell

2009-11-16 Thread Kerwin
Hi,

I am new to this forum and would like to know if the function described
below has been developed or exists in Solr. If it does not exist, is it a
good Idea and can I contribute.

We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).

Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index them
all without firing multiple URLs.

Example of XML
add
doc
field name=id34122/field
field name=authorMichael/field
field name=size3MB/field
field name=URLURL of the document/field
/doc
/add
doc2./doc2.../docN

I need to index all these documents by sending this XML in a single URL.The
collection of documents to be indexed could be on a file system.

I have altered the Solr code to be able to do this but is there an already
existing feature?


Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Anyone has a clue?



 List,
 
 
 I somehow fail to index certain pdf files using the
 ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
 modified schema. I have a very simple schema for this case using only
 and ID field, a timestamp field and two dynamic fields; ignored_* and
 attr_* both indexed, stored and multivalued strings. They are
 multivalued simple because some HTML files fail when storing multiple
 hyperlinks.
 
 I have posted multiple files to
 http://.../update/extract?literal.id=doc1 including:
 1. the whitepaper at
 http://www.lucidimagination.com/whitepaper/whats-new-in-lucene-2-9?sc=AP
 2. the html file of the frontpage of http://nu.nl/
 3. another pdf at
 http://www.google.nl/url?sa=tsource=webct=rescd=1ved=0CAcQFjAAurl=http%3A%2F%2Fcsl.stanford.edu%2F~christos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdfrct=jq=2007.cmp_mapreduce.hpca.pdfei=PPz7SpiiOM6l4QbZjKjRAwusg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8A
 
 For each document i have a corresponding select/?q=*:*:
 
 
 1. No text? Should i see something?
 
 docstr name=iddoc1/str
 arr name=ignored_content_type
 strapplication/octet-stream/str
 /arr
 arr name=ignored_stream_content_type
 str
 text/xml; charset=UTF-8;
 boundary=cf57b4ad644d
 /str
 /arr
 arr name=ignored_stream_size
 str491238/str
 /arr
 arr name=ignored_text
 str/str
 /arr
 date name=timestamp2009-11-12T12:17:23.016Z/date
 /doc
 
 
 2. Plenty of data, this seems to be ok
 
 doc
 str name=iddoc1/str
 arr name=ignored_content_type
 strapplication/xhtml+xml/str
 /arr
 arr name=ignored_links
 strhttp://www.nu.nl//str
 strhttp://www.nu.nl//str
 strhttp://www.nu.nl/algemeen//str
 strhttp://www.nu.nl/economie//str
 
 arr name=ignored_stream_content_type
 str
 text/xml; charset=UTF-8;
 boundary=b6e44d087bdd
 /str
 /arr
 arr name=ignored_stream_size
 str36991/str
 /arr
 arr name=ignored_text
 str
 A LOT OF TEXT HERE
 /str
 /arr
 date name=timestamp2009-11-12T12:19:15.415Z/date
 /doc
 
 
 3. a lot of garbage
 
 doc
 str name=iddoc1/str
 arr name=ignored_content_encoding
 strwindows-1252/str
 /arr
 arr name=ignored_content_language
 strfr/str
 /arr
 arr name=ignored_content_type
 strtext/plain/str
 /arr
 arr name=ignored_language
 strfr/str
 /arr
 arr name=ignored_stream_content_type
 str
 text/xml; charset=UTF-8;
 boundary=83df0fd4d358
 /str
 /arr
 arr name=ignored_stream_size
 str361458/str
 /arr
 arr name=ignored_text
 str
 A LOT OF GARBAGE HERE including
 
 ió½·Þp™ó 4­0› 
 š©xÓ ^CøùI3람š³î¨V ÚÜ¡yS4 ¹£ ² ›H 6õɨ5¤ ÅÜ磩bädÒøŸ\ �s%OîÐÙIÑYRäŠ ;4
 ¢9r —!rEôˆÌ {SìûD²à £©ïœ«{‘ínÆ N÷ô¥F»�™ ±¡Ë'ú\³=·m„Þ »ý)³Å=j¶B¢)`  Ñ
 „Ï™hjCu{£É5{¢¯ç6½Ñhr¢ºÃ=J M- AqsøtÜì ÿ^Rl S?¿óšM‰—lv‘Ø›Qüãý´ þžŽ
 $S;¾¦wze³Ù)qÉú§ ‰› ãqó…Ó ‰ªU:šBÝ‘GuŠë
 MM±Òv �~ ‚N‹t¢ä§~Ì ÞŒS—Êòö¼ÊÄQaº¸¿7tñ ¾Áç œãØŒ58$O 3Å~�8¿L  ‡ëŽó©pk_
 Ša Â=u×; (ä�...@.œ÷ä ù° µk+ÿ PP~ ¨*ݤ¿Œ™¡D»   @fI$0°�Î Ù·p“Œ,Øâ  †¶v
 ¤v1#8¼0 ›  èð€-†šZ 6¾  ! ñb ˆbˆ¤v)LS)T X² ¬ l...@€  6E$Q
 endstream
 endobj
 137 0
 obj/Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[1/W/o/r/d/C/u/n/t/M/a/i/x/l/S/g/c/h/K/m/e/s/R/v/I/P/A/H/L/space/p]
 endobj
 138 0 obj/Type/FontDescriptor/FontFile2 136 0 R/FontBBox[0 -210 942
 728]/FontName/WQHWKD+TTE31911E0t00/Flags 4/MissingWidth 750/StemV
 141/CapHeight 728/Ascent 728/Descent -210/ItalicAngle 0
 endobj
 139 0 obj/Count 12/Kids[140 0 R 141 0 R]/Type/Pages
 endobj
 140 0 obj/Count 6/Kids[147 0 R 1 0 R 4 0 R 7 0 R 22 0 R 25 0
 R]/Type/Pages/Parent 139 0 R
 endobj
 141 0 obj/Count 6/Kids[39 0 R 42 0 R 45 0 R 82 0 R 92 0 R 122 0
 R]/Type/Pages/Parent 
 
 
 
 /str
 /arr
 date name=timestamp2009-11-12T12:21:28.306Z/date
 /doc
 
 
 Any ideas? Why doesn't the whitepaper produce any results and why is the
 next whitepaper full of garbage? At least i'm happy that HTML works
 fine.
 
 
 
 Regards,
 
 -  
 Markus Jelsma  Buyways B.V.
 Technisch ArchitectFriesestraatweg 215c
 http://www.buyways.nl  9743 AD Groningen   
 
 
 Alg. 050-853 6600  KvK  01074105
 Tel. 050-853 6620  Fax. 050-3118124
 Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17
 


Re: Tika trouble

2009-11-16 Thread Antonio Calò
What I could try to say is that if you want to index a Pdf, then you should
use a Pdf extractor. A Pdf Extractor is able to extract the text content and
the metadata of the files. I suppose you have just opened and indexed the
pdf as is. So you stored bynary data and stop. For my applciation I've used
PdfExtractor, but also pdfBox project could be used.

Antonio

2009/11/16 Markus Jelsma - Buyways B.V. mar...@buyways.nl

 Anyone has a clue?



  List,
 
 
  I somehow fail to index certain pdf files using the
  ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
  modified schema. I have a very simple schema for this case using only
  and ID field, a timestamp field and two dynamic fields; ignored_* and
  attr_* both indexed, stored and multivalued strings. They are
  multivalued simple because some HTML files fail when storing multiple
  hyperlinks.
 
  I have posted multiple files to
  http://.../update/extract?literal.id=doc1 including:
  1. the whitepaper at
  http://www.lucidimagination.com/whitepaper/whats-new-in-lucene-2-9?sc=AP
  2. the html file of the frontpage of http://nu.nl/
  3. another pdf at
 
 http://www.google.nl/url?sa=tsource=webct=rescd=1ved=0CAcQFjAAurl=http%3A%2F%2Fcsl.stanford.edu%2F~christos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdfrct=jq=2007.cmp_mapreduce.hpca.pdfei=PPz7SpiiOM6l4QbZjKjRAwusg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8Ahttp://www.google.nl/url?sa=tsource=webct=rescd=1ved=0CAcQFjAAurl=http%3A%2F%2Fcsl.stanford.edu%2F%7Echristos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdfrct=jq=2007.cmp_mapreduce.hpca.pdfei=PPz7SpiiOM6l4QbZjKjRAwusg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8A
 
  For each document i have a corresponding select/?q=*:*:
 
 
  1. No text? Should i see something?
 
  docstr name=iddoc1/str
  arr name=ignored_content_type
  strapplication/octet-stream/str
  /arr
  arr name=ignored_stream_content_type
  str
  text/xml; charset=UTF-8;
  boundary=cf57b4ad644d
  /str
  /arr
  arr name=ignored_stream_size
  str491238/str
  /arr
  arr name=ignored_text
  str/str
  /arr
  date name=timestamp2009-11-12T12:17:23.016Z/date
  /doc
 
 
  2. Plenty of data, this seems to be ok
 
  doc
  str name=iddoc1/str
  arr name=ignored_content_type
  strapplication/xhtml+xml/str
  /arr
  arr name=ignored_links
  strhttp://www.nu.nl//str
  strhttp://www.nu.nl//str
  strhttp://www.nu.nl/algemeen//str
  strhttp://www.nu.nl/economie//str
  
  arr name=ignored_stream_content_type
  str
  text/xml; charset=UTF-8;
  boundary=b6e44d087bdd
  /str
  /arr
  arr name=ignored_stream_size
  str36991/str
  /arr
  arr name=ignored_text
  str
  A LOT OF TEXT HERE
  /str
  /arr
  date name=timestamp2009-11-12T12:19:15.415Z/date
  /doc
 
 
  3. a lot of garbage
 
  doc
  str name=iddoc1/str
  arr name=ignored_content_encoding
  strwindows-1252/str
  /arr
  arr name=ignored_content_language
  strfr/str
  /arr
  arr name=ignored_content_type
  strtext/plain/str
  /arr
  arr name=ignored_language
  strfr/str
  /arr
  arr name=ignored_stream_content_type
  str
  text/xml; charset=UTF-8;
  boundary=83df0fd4d358
  /str
  /arr
  arr name=ignored_stream_size
  str361458/str
  /arr
  arr name=ignored_text
  str
  A LOT OF GARBAGE HERE including
 
  ió½·Þp™ó 4­0›
  š©xÓ ^ CøùI3람š³î¨V ÚÜ¡yS4 ¹£ ² ›H 6õɨ5¤ ÅÜ磩bädÒøŸ\ �s%OîÐÙIÑYRäŠ ;4
  ¢9r —!rEôˆÌ {SìûD²à £©ïœ«{‘ínÆ N÷ô¥F»�™ ±¡Ë'ú\³=·m„Þ »ý)³Å=j¶B¢)`  Ñ
  „Ï™hjCu{£É5{¢¯ç6½Ñhr¢ºÃ=J M- AqsøtÜì ÿ^Rl S?¿óšM‰—lv‘Ø›Qüãý´ þžŽ
  $S;¾¦wze³Ù)qÉú§ ‰› ãqó…Ó ‰ªU:šBÝ‘GuŠë
  MM±Òv �~ ‚N‹t¢ä§~Ì ÞŒS—Êòö¼ÊÄQaº¸¿7tñ ¾Áç œãØŒ58$O 3Å~�8¿L  ‡ëŽó©pk _
  Ša Â=u×; (ä�...@.œ÷ä ù° µk+ÿ PP~ ¨*ݤ¿Œ™¡D»   @fI$0°�Î Ù·p“Œ,Øâ  †¶v
  ¤v1#8¼0 ›  èð€-†šZ 6¾  ! ñb ˆbˆ¤v)LS)T X² ¬ l...@€  6E$Q
  endstream
  endobj
  137 0
 
 obj/Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[1/W/o/r/d/C/u/n/t/M/a/i/x/l/S/g/c/h/K/m/e/s/R/v/I/P/A/H/L/space/p]
  endobj
  138 0 obj/Type/FontDescriptor/FontFile2 136 0 R/FontBBox[0 -210 942
  728]/FontName/WQHWKD+TTE31911E0t00/Flags 4/MissingWidth 750/StemV
  141/CapHeight 728/Ascent 728/Descent -210/ItalicAngle 0
  endobj
  139 0 obj/Count 12/Kids[140 0 R 141 0 R]/Type/Pages
  endobj
  140 0 obj/Count 6/Kids[147 0 R 1 0 R 4 0 R 7 0 R 22 0 R 25 0
  R]/Type/Pages/Parent 139 0 R
  endobj
  141 0 obj/Count 6/Kids[39 0 R 42 0 R 45 0 R 82 0 R 92 0 R 122 0
  R]/Type/Pages/Parent
 
  
 
  /str
  /arr
  date name=timestamp2009-11-12T12:21:28.306Z/date
  /doc
 
 
  Any ideas? Why doesn't the whitepaper produce any results and why is the
  next whitepaper full of garbage? At least i'm happy that HTML works
  fine.
 
 
 
  Regards,
 
  -
  Markus Jelsma  Buyways B.V.
  Technisch ArchitectFriesestraatweg 215c
  http://www.buyways.nl  9743 AD Groningen
 
 
  Alg. 050-853 6600  KvK  01074105
  Tel. 050-853 6620  Fax. 050-3118124
  Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17
 




-- 
Antonio Calò

Re: Indexing multiple documents in Solr/SolrCell

2009-11-16 Thread Sascha Szott

Hi,

the problem you've described -- an integration of DataImportHandler (to 
traverse the XML file and get the document urls) and Solr Cell (to 
extract content afterwards) -- is already addressed in issue SOLR-1358 
(https://issues.apache.org/jira/browse/SOLR-1358).


Best,
Sascha

Kerwin wrote:

Hi,

I am new to this forum and would like to know if the function described
below has been developed or exists in Solr. If it does not exist, is it a
good Idea and can I contribute.

We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).

Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index them
all without firing multiple URLs.

Example of XML
add
doc
field name=id34122/field
field name=authorMichael/field
field name=size3MB/field
field name=URLURL of the document/field
/doc
/add
doc2./doc2.../docN

I need to index all these documents by sending this XML in a single URL.The
collection of documents to be indexed could be on a file system.

I have altered the Solr code to be able to do this but is there an already
existing feature?





Re: javabin in .NET?

2009-11-16 Thread Mauricio Scheffer
Yep, I think I mostly nailed the unmarshalling. Need more tests though. And
then integrate it to SolrNet.
Is there any way (or are there any plans) to have an update handler that
accepts javabin?

2009/11/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 start with a JavabinDecoder only so that the class is simple to start with.

 2009/11/16 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
  For a client the marshal() part is not important.unmarshal() is
  probably all you need
 
  On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer
  mauricioschef...@gmail.com wrote:
  Original code is here: http://bit.ly/hkCbI
  I just started porting it here: http://bit.ly/37hiOs
  It needs: tests/debugging, porting NamedList, SolrDocument,
 SolrDocumentList
  Thanks for any help!
 
  Cheers,
  Mauricio
 
  2009/11/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  OK. Is there anyone trying it out? where is this code ? I can try to
 help
  ..
 
  On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
  mauricioschef...@gmail.com wrote:
   I meant the standard IO libraries. They are different enough that the
  code
   has to be manually ported. There were some automated tools back when
   Microsoft introduced .Net, but IIRC they never really worked.
  
   Anyway it's not a big deal, it should be a straightforward job.
 Testing
  it
   thoroughly cross-platform is another thing though.
  
   2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
  
   The javabin format does not have many dependencies. it may have 3-4
   classes an that is it.
  
   On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
   mauricioschef...@gmail.com wrote:
Nope. It has to be manually ported. Not so much because of the
  language
itself but because of differences in the libraries.
   
   
2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
   
Is there any tool to directly port java to .Net? then we can
 etxract
out the client part of the javabin code and convert it.
   
On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
  erik.hatc...@gmail.com
wrote:
 Has anyone looked into using the javabin response format from
 .NET
(instead
 of SolrJ)?

 It's mainly a curiosity.

 How much better could performance/bandwidth/throughput be?  How
   difficult
 would it be to implement some .NET code (C#, I'd guess being
 the
  best
 choice) to handle this response format?

 Thanks,
Erik


   
   
   
--
-
Noble Paul | Principal Engineer| AOL | http://aol.com
   
   
  
  
  
   --
   -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Thank you for your reply.

I had the assumption Tika could also extract text content from various
documenttypes instead of only meta data. I'll use the CLI tools from
http://www.foolabs.com/xpdf/ to extract text manually.


-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


On Mon, 2009-11-16 at 12:06 +0100, Antonio Calò wrote:

 What I could try to say is that if you want to index a Pdf, then you should
 use a Pdf extractor. A Pdf Extractor is able to extract the text content and
 the metadata of the files. I suppose you have just opened and indexed the
 pdf as is. So you stored bynary data and stop. For my applciation I've used
 PdfExtractor, but also pdfBox project could be used.
 
 Antonio
 
 2009/11/16 Markus Jelsma - Buyways B.V. mar...@buyways.nl
 
  Anyone has a clue?
 
 
 
   List,
  
  
   I somehow fail to index certain pdf files using the
   ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
   modified schema. I have a very simple schema for this case using only
   and ID field, a timestamp field and two dynamic fields; ignored_* and
   attr_* both indexed, stored and multivalued strings. They are
   multivalued simple because some HTML files fail when storing multiple
   hyperlinks.
  
   I have posted multiple files to
   http://.../update/extract?literal.id=doc1 including:
   1. the whitepaper at
   http://www.lucidimagination.com/whitepaper/whats-new-in-lucene-2-9?sc=AP
   2. the html file of the frontpage of http://nu.nl/
   3. another pdf at
  
  http://www.google.nl/url?sa=tsource=webct=rescd=1ved=0CAcQFjAAurl=http%3A%2F%2Fcsl.stanford.edu%2F~christos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdfrct=jq=2007.cmp_mapreduce.hpca.pdfei=PPz7SpiiOM6l4QbZjKjRAwusg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8Ahttp://www.google.nl/url?sa=tsource=webct=rescd=1ved=0CAcQFjAAurl=http%3A%2F%2Fcsl.stanford.edu%2F%7Echristos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdfrct=jq=2007.cmp_mapreduce.hpca.pdfei=PPz7SpiiOM6l4QbZjKjRAwusg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8A
  
   For each document i have a corresponding select/?q=*:*:
  
  
   1. No text? Should i see something?
  
   docstr name=iddoc1/str
   arr name=ignored_content_type
   strapplication/octet-stream/str
   /arr
   arr name=ignored_stream_content_type
   str
   text/xml; charset=UTF-8;
   boundary=cf57b4ad644d
   /str
   /arr
   arr name=ignored_stream_size
   str491238/str
   /arr
   arr name=ignored_text
   str/str
   /arr
   date name=timestamp2009-11-12T12:17:23.016Z/date
   /doc
  
  
   2. Plenty of data, this seems to be ok
  
   doc
   str name=iddoc1/str
   arr name=ignored_content_type
   strapplication/xhtml+xml/str
   /arr
   arr name=ignored_links
   strhttp://www.nu.nl//str
   strhttp://www.nu.nl//str
   strhttp://www.nu.nl/algemeen//str
   strhttp://www.nu.nl/economie//str
   
   arr name=ignored_stream_content_type
   str
   text/xml; charset=UTF-8;
   boundary=b6e44d087bdd
   /str
   /arr
   arr name=ignored_stream_size
   str36991/str
   /arr
   arr name=ignored_text
   str
   A LOT OF TEXT HERE
   /str
   /arr
   date name=timestamp2009-11-12T12:19:15.415Z/date
   /doc
  
  
   3. a lot of garbage
  
   doc
   str name=iddoc1/str
   arr name=ignored_content_encoding
   strwindows-1252/str
   /arr
   arr name=ignored_content_language
   strfr/str
   /arr
   arr name=ignored_content_type
   strtext/plain/str
   /arr
   arr name=ignored_language
   strfr/str
   /arr
   arr name=ignored_stream_content_type
   str
   text/xml; charset=UTF-8;
   boundary=83df0fd4d358
   /str
   /arr
   arr name=ignored_stream_size
   str361458/str
   /arr
   arr name=ignored_text
   str
   A LOT OF GARBAGE HERE including
  
   ió½·Þp™ó 4­0›
   š©xÓ ^ CøùI3람š³î¨V ÚÜ¡yS4 ¹£ ² ›H 6õɨ5¤ ÅÜ磩bädÒøŸ\ �s%OîÐÙIÑYRäŠ ;4
   ¢9r —!rEôˆÌ {SìûD²à £©ïœ«{‘ínÆ N÷ô¥F»�™ ±¡Ë'ú\³=·m„Þ »ý)³Å=j¶B¢)`  Ñ
   „Ï™hjCu{£É5{¢¯ç6½Ñhr¢ºÃ=J M- AqsøtÜì ÿ^Rl S?¿óšM‰—lv‘Ø›Qüãý´ þžŽ
   $S;¾¦wze³Ù)qÉú§ ‰› ãqó…Ó ‰ªU:šBÝ‘GuŠë
   MM±Òv �~ ‚N‹t¢ä§~Ì ÞŒS—Êòö¼ÊÄQaº¸¿7tñ ¾Áç œãØŒ58$O 3Å~�8¿L  ‡ëŽó©pk _
   Ša Â=u×; (ä�...@.œ÷ä ù° µk+ÿ PP~ ¨*ݤ¿Œ™¡D»   @fI$0°�Î Ù·p“Œ,Øâ  †¶v
   ¤v1#8¼0 ›  èð€-†šZ 6¾  ! ñb ˆbˆ¤v)LS)T X² ¬ l...@€  6E$Q
   endstream
   endobj
   137 0
  
  obj/Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[1/W/o/r/d/C/u/n/t/M/a/i/x/l/S/g/c/h/K/m/e/s/R/v/I/P/A/H/L/space/p]
   endobj
   138 0 obj/Type/FontDescriptor/FontFile2 136 0 R/FontBBox[0 -210 942
   728]/FontName/WQHWKD+TTE31911E0t00/Flags 4/MissingWidth 750/StemV
   141/CapHeight 728/Ascent 728/Descent -210/ItalicAngle 0
   endobj
   139 0 obj/Count 12/Kids[140 0 R 141 0 R]/Type/Pages
   endobj
   140 0 obj/Count 6/Kids[147 0 R 1 0 R 4 0 R 7 0 R 22 0 R 25 0
   

EmbeddedSolrServer: java.lang.NoClassDefFoundError: javax/servlet/ServletRequest

2009-11-16 Thread Leonardo Souza
Hi,

I'm newbie using Solr and I'd like to run some tests against our data set. I
have successful tested Solr + Cell using the standard Http Solr server
and now we need to test the Embedded solution and when a try to start the
embedded server i get this exception:

INFO: registering core:
Exception in thread Thread-1 java.lang.NoClassDefFoundError:
javax/servlet/ServletRequest
at
org.apache.solr.servlet.SolrRequestParsers.init(SolrRequestParsers.java:94)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.init(EmbeddedSolrServer.java:90)
at petrobras.ep.solrindexer.Embedded$1.run(Embedded.java:25)
Caused by: java.lang.ClassNotFoundException: javax.servlet.ServletRequest
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

The EmbeddedSolrServer depends on servlet-api?
I'm facing a lack of documentation about EmbeddedSolrServer, all
documentation is at http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer?

thanks in advance!

[ ]'s
Leonardo da S. Souza
°v°   Linux user #375225
/(_)\   http://counter.li.org/
^ ^


Experiences from migrating from FAST to Solr

2009-11-16 Thread Morten Tvenning

We'd like to share with the solr users a recent news item from http://sesat.no

Sesam has spent some three months migrating all its indexes from FAST to 
Solr+Lucene.
It was a joyful experience and allowed us to implement a number of improvements 
we never could under FAST.

We've written a review  on the whole process to help others wishing to take the 
same steps.
http://sesat.no/moving-from-fast-to-solr-review.html


And we've released (under the LGPLv3 license) our own Sesat document processing 
framework that is compatible with the FAST document processing framework.
http://sesat.no/documentprocessor.html

mrtn



RE: solr stops running periodically

2009-11-16 Thread Fuad Efendi
 By that I mean that the java/tomcat  
 process just disappears. 


I had similar problem when I started Tomcat via SSH, and then I improperly
closed SSH without exit command. 

In some cases (OutOfMemory) memory is not enough to generate log (or CPU can
be overloaded by Garbage Collector to such extent that you will have to wait
few days until LOG will be generated) - but process cant' disappear...

Process can't simply disappear... if it is JVM crash you should see dump
file (you may need to set specific option for JVM to generate dump file in
case of crash)





 -Original Message-
 From: athir nuaimi [mailto:at...@nuaim.com]
 Sent: November-15-09 1:46 PM
 To: solr-user@lucene.apache.org
 Subject: solr stops running periodically
 
  We have 4 machines running solr.  On one of the machines, every 2-3
  days solr stops running.  By that I mean that the java/tomcat
  process just disappears.  If I look at the catalina logs, I see
  normal log entries and then nothing.  There is no shutdown messages
  like you would normally see if you sent a SIGTERM to the process.
 
  Obviously this is a problem. I''m new to solr/java so if there are
  more diagnostic things I can do I'd appreciate any tips/advice.
 
  thanks in advance
  Athir





Solr - Load Increasing.

2009-11-16 Thread kalidoss

Hi All.

   My server solr box cpu utilization  increasing b/w 60 to 90% and 
some time solr is getting down and we are restarting it manually.


   No of documents in solr 30 laks.
   No of add/update requrest solr 30 thousand / day. Avg of every 30 
minutes around 500 writes.

   No of search request 9laks / day.
   Size of the data directory: 4gb.


   My system ram is 8gb.
   System available space 12gb.
   processor Family: Pentium Pro

   Our solr data size can be increase in number like 90 laks. and 
writes per day will be around 1laks.   - Hope its possible by solr.


   For write commit i have configured like
   autoCommit
   maxDocs1/maxDocs
   maxTime10/maxTime
   /autoCommit

   Is all above can be possible? 90laks datas and 1laks per day writes 
and 30laks per day read??  - if yes what type of system configuration 
would require.


   Please suggest us.

thanks,
Kalidoss.m,
  



Get your world in your inbox!

Mail, widgets, documents, spreadsheets, organizer and much more with your 
Sifymail WIYI id!
Log on to http://www.sify.com

** DISCLAIMER **
Information contained and transmitted by this E-MAIL is proprietary to 
Sify Limited and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If this is a 
forwarded message, the content of this E-MAIL may not have been sent with 
the authority of the Company. If you are not the intended recipient, an 
agent of the intended recipient or a  person responsible for delivering the 
information to the named recipient,  you are notified that any use, 
distribution, transmission, printing, copying or dissemination of this 
information in any way or in any manner is strictly prohibited. If you have 
received this communication in error, please delete this mail  notify us 
immediately at ad...@sifycorp.com


Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-11-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Nov 16, 2009 at 6:25 PM, amitj am...@ieee.org wrote:

 Is there also a way we can include some kind of annotation on the schema
 field and send the data retrieved for that field to an external application.
 We have a requirement where we require some data fields (out of the fields
 for an entity defined in data-config.xml) to act as entities for entity
 extraction and auto complete purposes and we are using some external
 application.
No. it is not possible in Solr now.


 Noble Paul നോബിള്‍  नोब्ळ् wrote:

 writing to a remote Solr through SolrJ is in the cards. I may even
 take it up after 1.4 release. For now your best bet is to override the
 class SolrWriter and override the corresponding methods for
 add/delete.

 2009/4/27 Amit Nithian anith...@gmail.com:
  All,
  I have a few questions regarding the data import handler. We have some
  pretty gnarly SQL queries to load our indices and our current loader
  implementation is extremely fragile. I am looking to migrate over to
 the
  DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
 stuff
  to remotely load the indices so that my index loader and main search
 engine
  are separated.
  Currently, unless I am missing something, the data gathering from the
 entity
  and the data processing (i.e. conversion to a Solr Document) is done
  sequentially and I was looking to make this execute in parallel so
 that I
  can have multiple threads processing different parts of the resultset
 and
  loading documents into Solr. Secondly, I need to create temporary
 tables
 to
  store results of a few queries and use them later for inner joins was
  wondering how to best go about this?
 
  I am thinking to add support in DIH for the following:
  1) Temporary tables (maybe call it temporary entities)? --Specific
 only
 to
  SQL though unless it can be generalized to other sources.
  2) Parallel support
   - Including some mechanism to get the number of records (whether it
 be
  count or the MAX(custom_id)-MIN(custom_id))
  3) Support in DIH or Solr to post documents to a remote index (i.e.
 create a
  new UpdateHandler instead of DirectUpdateHandler2).
 
  If any of these exist or anyone else is working on this (OR you have
 better
  suggestions), please let me know.
 
  Thanks!
  Amit
 



 --

 -





 --
 --Noble Paul



 --
 View this message in context: 
 http://old.nabble.com/DataImportHandler-Questions-Load-data-in-parallel-and-temp-tables-tp23266396p26371403.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: javabin in .NET?

2009-11-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Nov 16, 2009 at 5:55 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
 Yep, I think I mostly nailed the unmarshalling. Need more tests though. And
 then integrate it to SolrNet.
 Is there any way (or are there any plans) to have an update handler that
 accepts javabin?
There is already one . look at BinaryRequestWriter.
But I would say that may not make a lot of difference as indexing is a
back-end operation and slight perf improvements won't make much
difference.

 2009/11/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 start with a JavabinDecoder only so that the class is simple to start with.

 2009/11/16 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
  For a client the marshal() part is not important.unmarshal() is
  probably all you need
 
  On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer
  mauricioschef...@gmail.com wrote:
  Original code is here: http://bit.ly/hkCbI
  I just started porting it here: http://bit.ly/37hiOs
  It needs: tests/debugging, porting NamedList, SolrDocument,
 SolrDocumentList
  Thanks for any help!
 
  Cheers,
  Mauricio
 
  2009/11/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  OK. Is there anyone trying it out? where is this code ? I can try to
 help
  ..
 
  On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
  mauricioschef...@gmail.com wrote:
   I meant the standard IO libraries. They are different enough that the
  code
   has to be manually ported. There were some automated tools back when
   Microsoft introduced .Net, but IIRC they never really worked.
  
   Anyway it's not a big deal, it should be a straightforward job.
 Testing
  it
   thoroughly cross-platform is another thing though.
  
   2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
  
   The javabin format does not have many dependencies. it may have 3-4
   classes an that is it.
  
   On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
   mauricioschef...@gmail.com wrote:
Nope. It has to be manually ported. Not so much because of the
  language
itself but because of differences in the libraries.
   
   
2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
   
Is there any tool to directly port java to .Net? then we can
 etxract
out the client part of the javabin code and convert it.
   
On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
  erik.hatc...@gmail.com
wrote:
 Has anyone looked into using the javabin response format from
 .NET
(instead
 of SolrJ)?

 It's mainly a curiosity.

 How much better could performance/bandwidth/throughput be?  How
   difficult
 would it be to implement some .NET code (C#, I'd guess being
 the
  best
 choice) to handle this response format?

 Thanks,
        Erik


   
   
   
--
-
Noble Paul | Principal Engineer| AOL | http://aol.com
   
   
  
  
  
   --
   -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Jerome L Quinn


Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 11/13/2009 11:15:43
PM:

 Let's take a step back.  Why do you need to optimize?  You said: As
 long as I'm not optimizing, search and indexing times are
satisfactory. :)

 You don't need to optimize just because you are continuously adding
 and deleting documents.  On the contrary!


That's a fair question.

Basically, search entries are keyed to other documents.  We have finite
storage,
so we purge old documents.  My understanding was that deleted documents
still
take space until an optimize is done.  Therefore, if I don't optimize, the
index
size on disk will grow without bound.

Am I mistaken?  If I don't ever have to optimize, it would make my life
easier.

Thanks,
Jerry


Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 4:09 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 please don't kill -9 ... it's grossly overkill, and doesn't give your
[ ... snip ... ]
 Alternately, you could take advantage of the enabled feature from your
 client (just have it test the enabled url ever N updates or so) and when
 it sees that you have disabled the port it can send one last commit and
 then stop sending updates until it sees the enabled URL work againg -- as
 soon as you see the updates stop, you can safely shutdown hte port.

Thanks, Hoss.  I'll use Catalina stop instead of kill -9.

It's good to know about the enabled feature -- my team was just
discussing whether something like that existed that we could use --
but as we'd also like to recover cleanly from power failures and other
Solr terminations, I think we'll track which docs are uncommitted
outside of Solr.

Michael


ext3 vs ext4 vs xfs for solr....recommendations needed...

2009-11-16 Thread William Pierce
Folks:

For those of your experienced linux-solr hands, I am seeking recommendations 
for which file system you think would work best with solr.  We are currently 
running with Ubuntu 9.04 on an amazon ec2 instance.  The default file system I 
think is ext3.  

 I am of course seeking, of course, to ensure good performance with stability.  
What I have been reading is that ext4 may be a little too bleeding edge...but 
I defer to those of you who know more about this...

Thanks,

- Bill

Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 11:02 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 So I think the question is really:
 If I stop the servlet container, does Solr issue a commit in the shutdown 
 hook in order to ensure all buffered docs are persisted to disk before the 
 JVM exits.

Exactly right, Otis.

 I don't have the Solr source handy, but if I did, I'd look for Shutdown, 
 Hook and finalize in the code.

Thanks for the direction.  There was some talk of close()ing a
SolrCore that I found, but I don't believe this meant a commit.

I somehow hadn't thought of actually *trying* to add a doc and then
shut down a Solr instance; shame on me.  Unfortunately, when I test
this via
 * make a new solr
 * add a doc
 * commit
 * verify it shows up in a search -- it does
 * add a 2nd doc
 * shutdown
solr doesn't stop.  It stops accepting connections, but java refuses
to actually die.  Not sure what we're doing wrong on our end, but I
see this frequently and end up having to do a kill (usually not -9!).
I guess we'll stick with externally tracking which docs have
committed, so that when we inevitably have to kill Solr it doesn't
cause a problem.

Michael


Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 11:45 PM, Lance Norskog goks...@gmail.com wrote:
 I would go with polling Solr to find what is not yet there. In
 production, it is better to assume that things will break, and have
 backstop janitors that fix them. And then test those janitors
 regularly.

Good idea, Lance.  I certainly agree with the idea of backstop
janitors.  We don't have a good way of polling Solr for what's in
there or not -- we have a kind of asynchronous, multithreaded updating
system sending docs to Solr -- but we always can find out *externally*
which docs have been committed or not.

Michael


Re: ext3 vs ext4 vs xfs for solr....recommendations needed...

2009-11-16 Thread Mark Miller
William Pierce wrote:
 Folks:

 For those of your experienced linux-solr hands, I am seeking recommendations 
 for which file system you think would work best with solr.  We are currently 
 running with Ubuntu 9.04 on an amazon ec2 instance.  The default file system 
 I think is ext3.  

  I am of course seeking, of course, to ensure good performance with 
 stability.  What I have been reading is that ext4 may be a little too 
 bleeding edge...but I defer to those of you who know more about this...

 Thanks,

 - Bill
   
I'd prob stick to ext3 - there appear to be quite a few wins in terms of
access speed, but ext4 has some sort of issue with writes - I think it
involves fsync, which lucene/solr uses for an index commit. If you have
Lucene's autocommit turned on (off by default, and removed in Lucene
3.0), the speed on ext4 is just hammered for indexing. Its not so bad
without autocommit (fewer fsyncs, as they should only occur on Solr
commits), but it makes the upgrade less compelling certainly.

You can see the hit in this sqllite insert test - I'm guessing its the
same issue:

http://www.phoronix.com/scan.php?page=articleitem=ext4_btrfs_nilfs2num=2


-- 
- Mark

http://www.lucidimagination.com





Index time boosting troubles

2009-11-16 Thread Jón Helgi Jónsson
Hi,

I had working index time boosting on documents like so: doc boost=10.0

Everything was great until I made some changes that I thought where no
related to the doc boost but after that my doc boosting appears to be
missing.

I'm having a tough time debugging this and didn't have the sense to version
control this so I would have something to revert to (lesson learned).

In schema.xml I have fieldType name=float class=solr.FloatField
omitNorms=false/

Is there something else I should be watching out for? Some query parameter
perhaps?

Or something else? I think wildcards in query affect it but I don't have
any, some setting in solrconfig.xml or cheme.xml?

Thanks!
Jon


Re: Some guide about setting up local/geo search at solr

2009-11-16 Thread Bertie Shen
Localsolr is not in contrib yet. I am interested in knowing whether
currently there is a better solution for setting up a local search.

Cheers.



On Sun, Nov 15, 2009 at 9:25 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Nota bene:
 My understanding is the external versions of Local Lucene/Solr are
 eventually going to be deprecated in favour of what we have in contrib.
  Here's a stub page with a link to the spatial JIRA issue:
 http://wiki.apache.org/solr/SpatialSearch

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
  From: Bertie Shen bertie.s...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, November 14, 2009 3:32:01 AM
  Subject: Some guide about setting up local/geo search at solr
 
  Hey,
 
  I spent some times figuring out how to set up local/geo/spatial search at
  solr. I hope the following description can help  given the current
 status.
 
  1) Download localsolr. I download it from
  http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and put
 jar
  file (in my case, localsolr-1.5.jar) in your application's WEB_INF/lib
  directory of application server.
 
  2) Download locallucene. I download it from
  http://sourceforge.net/projects/locallucene/ and put jar file (in my
 case,
  locallucene.jar in locallucene_r2.0/dist/ diectory) in your application's
  WEB_INF/lib directory of application server. I also need to copy
  gt2-referencing-2.3.1.jar, geoapi-nogenerics-2.1-M2.jar, and
 jsr108-0.01.jar
  under locallucene_r2.0/lib/ directory to WEB_INF/lib. Do not copy
  lucene-spatial-2.9.1.jar under Lucene codebase. The namespace has been
  changed from com.pjaol.blah.blah.blah to org.apache.blah blah.
 
  3) Update your solrconfig.xml and schema.xml. I copy it from
  http://www.gissearch.com/localsolr.
 
  4) Restart application server and try a query
  /solr/select?qt=geolat=xx.xxlong=yy.yyq=abcradius=zz.




$DeleteDocbyQuery in solr 1.4 is not working

2009-11-16 Thread Mark Ellul
Hi,

I have added a deleted field in my database, and am using the
Dataimporthandler to add rows to the index...

I am using solr 1.4

I have added my the deleted field to the query and the RegexTransformer...
and the field definition below

field column=$deleteDocByQuery
regex=^true$
replaceWith=id:${List.id} sourceColName=deleted/

When I run the deltaImport command... I see the below output

INFO: [] webapp=/solr path=/dataimport
params={command=delta-importdebug=trueexpungeDeletes=true} status=0
QTime=1
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Starting delta collection.
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: List
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity List with URL:
jdbc:postgresql://localhost:5432/tlists
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 4
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: List rows obtained : 1
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: List rows obtained : 0
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: List
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.SolrWriter
deleteByQuery
INFO: Deleting documents from Solr with query: id:api__list__365522
Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
 
commit{dir=/mnt/solr-index/index,segFN=segments_r,version=1257863009839,generation=27,filenames=[_bg.fdt,
_bg.tii, segments_r, _bg.fnm, _bg.nrm, _bg.fdx, _bg.prx, _bg.tis, _bg.frq]
Nov 16, 2009 5:29:10 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1257863009839
Nov 16, 2009 5:29:10 PM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Delta Import completed successfully

It says its deleting the document... but when I do the search its still
showing up

Any Ideas?

Regards

Mark


Config Relationship between MaxWarmingSearchers and StreamingUpdateSolrServer

2009-11-16 Thread Erik Earle
My application updates the master index frequently, sometimes very frequently.  
  Is there a good rule of thumb for configuring:

1) maxWarmingSearchers in the master
2) the SUSS thread pool size (and perhaps queue length) to match the server 
settings?


  


Re: SolrJ looping until I get all the results

2009-11-16 Thread Mck
On Mon, 2009-11-02 at 19:49 -0500, Paul Tomblin wrote:
 Here's what I'm thinking
 
 final static int MAX_ROWS = 100;
 int start = 0;
 query.setRows(MAX_ROWS);
 while (true)
 {
QueryResponse resp = solrChunkServer.query(query);
SolrDocumentList docs = resp.getResults();
if (docs.size() == 0)
  break;

   start += MAX_ROWS;
   query.setStart(start);
 } 

Why not after the first limited fetch read how many hits there are and
on the second fetch get all remaining documents.

Example code (see the do-while loop)
http://sesat.no/projects/sesat-kernel/xref/no/sesat/search/query/token/SolrTokenEvaluator.html#237

~mck



-- 
This above all: to thine own self be true. It must follow that you
cannot then be false to any man. Shakespeare 
| semb.wever.org | sesat.no | finn.no |


signature.asc
Description: This is a digitally signed message part


Re: Wildcards at the Beginning of a Search.

2009-11-16 Thread Jay Hill
There is a text_rev field type in the example schema.xml file in the
official release of 1.4. It uses the ReversedWildcardFilterFactory to revers
a field. You can do a copyField from the field you want to use for leading
wildcard searches to a field using the text_rev field, and then do a regular
trailing wildcard search on the reversed field.

-Jay
http://www.lucidimagination.com


On Thu, Nov 12, 2009 at 4:41 AM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 is in solr 1.4 maby a way to search with an wildcard at the beginning?

 in 1.3 i cant activate it.

 KingArtus



PhP, Solr and Delta Imports

2009-11-16 Thread Pablo Ferrari
Hello,

I have an already working Solr service based un full imports connected via
php to a Zend Framework MVC (I connect it directly to the Controller).
I use the SolrClient class for php which is great:
http://www.php.net/manual/en/class.solrclient.php

For now on, every time I want to edit a document I have to do a full import
again or I can delete the document by its id and add it again with the
updated info...
Anyone can guide me a bit in how to do delta imports? If its via php,
better!

Thanks in advance,

Pablo Ferrari
Tinkerlabs.net


Re: PhP, Solr and Delta Imports

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 2:49 PM, Pablo Ferrari pabs.ferr...@gmail.comwrote:

 Hello,

 I have an already working Solr service based un full imports connected via
 php to a Zend Framework MVC (I connect it directly to the Controller).
 I use the SolrClient class for php which is great:
 http://www.php.net/manual/en/class.solrclient.php

 For now on, every time I want to edit a document I have to do a full import
 again or I can delete the document by its id and add it again with the
 updated info...
 Anyone can guide me a bit in how to do delta imports? If its via php,
 better!

 Thanks in advance,

 Pablo Ferrari
 Tinkerlabs.net



Hello Pablo,

You have a couple of options and you do not have to do a full data re-import
for the entire index.

My example below uses 'doc_id' as the uniqueKey field in your schema. It
also assumes that it is an integer type

1. You can remove the document from the index by query or by id (assuming
you have its id or uniqueKey field) if you want to just take it out of the
active index.

$client = new SolrClient($options);

$client-deleteById(400); // I recommend this one

OR

$client-deleteByQuery('doc_id:400'); // This should work too.

2. If all you want to do is to replace/update an existing document in the
Solr index and you still want the document to remain active in the index
then you can just update it by building a SolrInputDocument object and then
submitting just that document using the SolrClient.

$client = new SolrClient($options);

$doc = new SolrInputDocument();

$doc-addField('doc_id', 334455);
$doc-addField('other_field', 'Other Field Value');
$doc-addField('another_field', 'Another Field Value');

$updateResponse = $client-addDocument($doc);

If your changes are coming from the db it would be helpful to have a time
stamp column that changes each time the record is modified.

Then you can keep track of when the last index process was done and the next
time you can retrieve only 'active' documents that have been modified or
created after this last re-index process. You can send the
SolrInputDocuments to the Solr Index using the SolrClient object as shown
above for each document.

Do not forget to save the changes to the index with a call to
SolrClient::commit()

If you are updating a lot of records, I would remmend waiting till the end
to do the commit (and optimize call if needed).

More examples are available here

http://us2.php.net/manual/en/solr.examples.php

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Config Relationship between MaxWarmingSearchers and StreamingUpdateSolrServer

2009-11-16 Thread Otis Gospodnetic
Hi Erik,

I didn't look at the source code, and I think the javadoc for SUSS doesn't 
mention it, but I am under the impression that the number of threads to use 
should roughly match the number of CPU cores on the master.  The 
maxWarmingSearchers should only be relevant to slaves, not masters, no?

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Erik Earle erikea...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 1:20:23 PM
 Subject: Config Relationship between MaxWarmingSearchers and 
 StreamingUpdateSolrServer
 
 My application updates the master index frequently, sometimes very 
 frequently.  
   Is there a good rule of thumb for configuring:
 
 1) maxWarmingSearchers in the master
 2) the SUSS thread pool size (and perhaps queue length) to match the server 
 settings?



Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Otis Gospodnetic
I'd have to verify this to be sure, but I *believe* deleted docs data is 
expunged during index segment merges.

See 
https://issues.apache.org/jira/browse/SOLR-1275

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Jerome L Quinn jlqu...@us.ibm.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 10:05:55 AM
 Subject: Re: Solr 1.3 query and index perf tank during optimize
 
 
 
 Otis Gospodnetic wrote on 11/13/2009 11:15:43
 PM:
 
  Let's take a step back.  Why do you need to optimize?  You said: As
  long as I'm not optimizing, search and indexing times are
 satisfactory. :)
 
  You don't need to optimize just because you are continuously adding
  and deleting documents.  On the contrary!
 
 
 That's a fair question.
 
 Basically, search entries are keyed to other documents.  We have finite
 storage,
 so we purge old documents.  My understanding was that deleted documents
 still
 take space until an optimize is done.  Therefore, if I don't optimize, the
 index
 size on disk will grow without bound.
 
 Am I mistaken?  If I don't ever have to optimize, it would make my life
 easier.
 
 Thanks,
 Jerry



Re: Solr - Load Increasing.

2009-11-16 Thread Otis Gospodnetic
Hi,

Your autoCommit settings are very aggressive.  I'm guessing that's what's 
causing the CPU load.

btw. what is laks?

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: kalidoss kalidoss.muthuramalin...@sifycorp.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 9:11:21 AM
 Subject: Solr - Load Increasing. 
 
 Hi All.
 
My server solr box cpu utilization  increasing b/w 60 to 90% and some time 
 solr is getting down and we are restarting it manually.
 
No of documents in solr 30 laks.
No of add/update requrest solr 30 thousand / day. Avg of every 30 minutes 
 around 500 writes.
No of search request 9laks / day.
Size of the data directory: 4gb.
 
 
My system ram is 8gb.
System available space 12gb.
processor Family: Pentium Pro
 
Our solr data size can be increase in number like 90 laks. and writes per 
 day 
 will be around 1laks.   - Hope its possible by solr.
 
For write commit i have configured like
   
   1
   10
   
 
Is all above can be possible? 90laks datas and 1laks per day writes and 
 30laks per day read??  - if yes what type of system configuration would 
 require.
 
Please suggest us.
 
 thanks,
 Kalidoss.m,
   
 
 Get your world in your inbox!
 
 Mail, widgets, documents, spreadsheets, organizer and much more with your 
 Sifymail WIYI id!
 Log on to http://www.sify.com
 
 ** DISCLAIMER **
 Information contained and transmitted by this E-MAIL is proprietary to Sify 
 Limited and is intended for use only by the individual or entity to which it 
 is 
 addressed, and may contain information that is privileged, confidential or 
 exempt from disclosure under applicable law. If this is a forwarded message, 
 the 
 content of this E-MAIL may not have been sent with the authority of the 
 Company. 
 If you are not the intended recipient, an agent of the intended recipient or 
 a  
 person responsible for delivering the information to the named recipient,  
 you 
 are notified that any use, distribution, transmission, printing, copying or 
 dissemination of this information in any way or in any manner is strictly 
 prohibited. If you have received this communication in error, please delete 
 this 
 mail  notify us immediately at ad...@sifycorp.com



Re: Solr - Load Increasing.

2009-11-16 Thread Walter Underwood
Probably lakh: 100,000.

So, 900k qpd and 3M docs.

http://en.wikipedia.org/wiki/Lakh

wunder

On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:

 Hi,
 
 Your autoCommit settings are very aggressive.  I'm guessing that's what's 
 causing the CPU load.
 
 btw. what is laks?
 
 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 - Original Message 
 From: kalidoss kalidoss.muthuramalin...@sifycorp.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 9:11:21 AM
 Subject: Solr - Load Increasing. 
 
 Hi All.
 
   My server solr box cpu utilization  increasing b/w 60 to 90% and some time 
 solr is getting down and we are restarting it manually.
 
   No of documents in solr 30 laks.
   No of add/update requrest solr 30 thousand / day. Avg of every 30 minutes 
 around 500 writes.
   No of search request 9laks / day.
   Size of the data directory: 4gb.
 
 
   My system ram is 8gb.
   System available space 12gb.
   processor Family: Pentium Pro
 
   Our solr data size can be increase in number like 90 laks. and writes per 
 day 
 will be around 1laks.   - Hope its possible by solr.
 
   For write commit i have configured like
 
  1
  10
 
 
   Is all above can be possible? 90laks datas and 1laks per day writes and 
 30laks per day read??  - if yes what type of system configuration would 
 require.
 
   Please suggest us.
 
 thanks,
 Kalidoss.m,
 
 
 Get your world in your inbox!
 
 Mail, widgets, documents, spreadsheets, organizer and much more with your 
 Sifymail WIYI id!
 Log on to http://www.sify.com
 
 ** DISCLAIMER **
 Information contained and transmitted by this E-MAIL is proprietary to Sify 
 Limited and is intended for use only by the individual or entity to which it 
 is 
 addressed, and may contain information that is privileged, confidential or 
 exempt from disclosure under applicable law. If this is a forwarded message, 
 the 
 content of this E-MAIL may not have been sent with the authority of the 
 Company. 
 If you are not the intended recipient, an agent of the intended recipient or 
 a  
 person responsible for delivering the information to the named recipient,  
 you 
 are notified that any use, distribution, transmission, printing, copying or 
 dissemination of this information in any way or in any manner is strictly 
 prohibited. If you have received this communication in error, please delete 
 this 
 mail  notify us immediately at ad...@sifycorp.com
 



RE: Solr - Load Increasing.

2009-11-16 Thread Sudarsan, Sithu D.
 
Hi,

Lakh or Lac - 100,000
Crore   - 100,00,000 (ten million)

Commonly used in India

Sincerely,
Sithu D Sudarsan

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, November 16, 2009 5:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Load Increasing.

Probably lakh: 100,000.

So, 900k qpd and 3M docs.

http://en.wikipedia.org/wiki/Lakh

wunder

On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:

 Hi,
 
 Your autoCommit settings are very aggressive.  I'm guessing that's
what's causing the CPU load.
 
 btw. what is laks?
 
 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 - Original Message 
 From: kalidoss kalidoss.muthuramalin...@sifycorp.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 9:11:21 AM
 Subject: Solr - Load Increasing. 
 
 Hi All.
 
   My server solr box cpu utilization  increasing b/w 60 to 90% and
some time 
 solr is getting down and we are restarting it manually.
 
   No of documents in solr 30 laks.
   No of add/update requrest solr 30 thousand / day. Avg of every 30
minutes 
 around 500 writes.
   No of search request 9laks / day.
   Size of the data directory: 4gb.
 
 
   My system ram is 8gb.
   System available space 12gb.
   processor Family: Pentium Pro
 
   Our solr data size can be increase in number like 90 laks. and
writes per day 
 will be around 1laks.   - Hope its possible by solr.
 
   For write commit i have configured like
 
  1
  10
 
 
   Is all above can be possible? 90laks datas and 1laks per day writes
and 
 30laks per day read??  - if yes what type of system configuration
would require.
 
   Please suggest us.
 
 thanks,
 Kalidoss.m,
 
 
 Get your world in your inbox!
 
 Mail, widgets, documents, spreadsheets, organizer and much more with
your 
 Sifymail WIYI id!
 Log on to http://www.sify.com
 
 ** DISCLAIMER **
 Information contained and transmitted by this E-MAIL is proprietary
to Sify 
 Limited and is intended for use only by the individual or entity to
which it is 
 addressed, and may contain information that is privileged,
confidential or 
 exempt from disclosure under applicable law. If this is a forwarded
message, the 
 content of this E-MAIL may not have been sent with the authority of
the Company. 
 If you are not the intended recipient, an agent of the intended
recipient or a  
 person responsible for delivering the information to the named
recipient,  you 
 are notified that any use, distribution, transmission, printing,
copying or 
 dissemination of this information in any way or in any manner is
strictly 
 prohibited. If you have received this communication in error, please
delete this 
 mail  notify us immediately at ad...@sifycorp.com
 



Re: Solr - Load Increasing.

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Probably lakh: 100,000.

 So, 900k qpd and 3M docs.

 http://en.wikipedia.org/wiki/Lakh

 wunder

 On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:

  Hi,
 
  Your autoCommit settings are very aggressive.  I'm guessing that's what's
 causing the CPU load.
 
  btw. what is laks?
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: kalidoss kalidoss.muthuramalin...@sifycorp.com
  To: solr-user@lucene.apache.org
  Sent: Mon, November 16, 2009 9:11:21 AM
  Subject: Solr - Load Increasing.
 
  Hi All.
 
My server solr box cpu utilization  increasing b/w 60 to 90% and some
 time
  solr is getting down and we are restarting it manually.
 
No of documents in solr 30 laks.
No of add/update requrest solr 30 thousand / day. Avg of every 30
 minutes
  around 500 writes.
No of search request 9laks / day.
Size of the data directory: 4gb.
 
 
My system ram is 8gb.
System available space 12gb.
processor Family: Pentium Pro
 
Our solr data size can be increase in number like 90 laks. and writes
 per day
  will be around 1laks.   - Hope its possible by solr.
 
For write commit i have configured like
 
   1
   10
 
 
Is all above can be possible? 90laks datas and 1laks per day writes
 and
  30laks per day read??  - if yes what type of system configuration would
 require.
 
Please suggest us.
 
  thanks,
  Kalidoss.m,
 
 
  Get your world in your inbox!
 
  Mail, widgets, documents, spreadsheets, organizer and much more with
 your
  Sifymail WIYI id!
  Log on to http://www.sify.com
 
  ** DISCLAIMER **
  Information contained and transmitted by this E-MAIL is proprietary to
 Sify
  Limited and is intended for use only by the individual or entity to
 which it is
  addressed, and may contain information that is privileged, confidential
 or
  exempt from disclosure under applicable law. If this is a forwarded
 message, the
  content of this E-MAIL may not have been sent with the authority of the
 Company.
  If you are not the intended recipient, an agent of the intended
 recipient or a
  person responsible for delivering the information to the named
 recipient,  you
  are notified that any use, distribution, transmission, printing, copying
 or
  dissemination of this information in any way or in any manner is
 strictly
  prohibited. If you have received this communication in error, please
 delete this
  mail  notify us immediately at ad...@sifycorp.com
 




Thanks Walter for clarifying that.

I too was wondering what laks meant.

It was a bit distracting when I read the original post.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Solr - Load Increasing.

2009-11-16 Thread Shashi Kant
I think it would be useful for members of this list to realize that not
everyone uses the same metrology and terms.

It is very easy for Americans to use the imperial system and presume
everyone does the same; Europeans to use the metric system etc. Hopefully
members on this list would be persuaded to use or at least clarify their
terminology.

While the apocryphal saying goes  the great thing about standards is they
are so many choose from, we should all make an effort to communicate across
cultures and nations.



On Mon, Nov 16, 2009 at 5:33 PM, Israel Ekpo israele...@gmail.com wrote:

 On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.org
 wrote:

  Probably lakh: 100,000.
 
  So, 900k qpd and 3M docs.
 
  http://en.wikipedia.org/wiki/Lakh
 
  wunder
 
  On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:
 
   Hi,
  
   Your autoCommit settings are very aggressive.  I'm guessing that's
 what's
  causing the CPU load.
  
   btw. what is laks?
  
   Otis
   --
   Sematext is hiring -- http://sematext.com/about/jobs.html?mls
   Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
  
  
  
   - Original Message 
   From: kalidoss kalidoss.muthuramalin...@sifycorp.com
   To: solr-user@lucene.apache.org
   Sent: Mon, November 16, 2009 9:11:21 AM
   Subject: Solr - Load Increasing.
  
   Hi All.
  
 My server solr box cpu utilization  increasing b/w 60 to 90% and
 some
  time
   solr is getting down and we are restarting it manually.
  
 No of documents in solr 30 laks.
 No of add/update requrest solr 30 thousand / day. Avg of every 30
  minutes
   around 500 writes.
 No of search request 9laks / day.
 Size of the data directory: 4gb.
  
  
 My system ram is 8gb.
 System available space 12gb.
 processor Family: Pentium Pro
  
 Our solr data size can be increase in number like 90 laks. and
 writes
  per day
   will be around 1laks.   - Hope its possible by solr.
  
 For write commit i have configured like
  
1
10
  
  
 Is all above can be possible? 90laks datas and 1laks per day writes
  and
   30laks per day read??  - if yes what type of system configuration
 would
  require.
  
 Please suggest us.
  
   thanks,
   Kalidoss.m,
  
  
   Get your world in your inbox!
  
   Mail, widgets, documents, spreadsheets, organizer and much more with
  your
   Sifymail WIYI id!
   Log on to http://www.sify.com
  
   ** DISCLAIMER **
   Information contained and transmitted by this E-MAIL is proprietary to
  Sify
   Limited and is intended for use only by the individual or entity to
  which it is
   addressed, and may contain information that is privileged,
 confidential
  or
   exempt from disclosure under applicable law. If this is a forwarded
  message, the
   content of this E-MAIL may not have been sent with the authority of
 the
  Company.
   If you are not the intended recipient, an agent of the intended
  recipient or a
   person responsible for delivering the information to the named
  recipient,  you
   are notified that any use, distribution, transmission, printing,
 copying
  or
   dissemination of this information in any way or in any manner is
  strictly
   prohibited. If you have received this communication in error, please
  delete this
   mail  notify us immediately at ad...@sifycorp.com
  
 
 


 Thanks Walter for clarifying that.

 I too was wondering what laks meant.

 It was a bit distracting when I read the original post.
 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.



Re: Solr - Load Increasing.

2009-11-16 Thread Tom Alt
Nice to learn a new word for the day!

But to answer your question, or at least part of it, I don't really think
you want a configuration like

  autoCommit
  maxDocs1/maxDocs
  maxTime10/maxTime
  /autoCommit

Committing every doc, and every 10 milliseconds? That's just asking for
problems. How about starting with 1000 docs, and five minutes for maxTime
(5*60*1000) or about 3 laks of milliseconds.

That should help performance a lot. Try that, and see how it works.

Tom

On Mon, Nov 16, 2009 at 2:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 I think it would be useful for members of this list to realize that not
 everyone uses the same metrology and terms.

 It is very easy for Americans to use the imperial system and presume
 everyone does the same; Europeans to use the metric system etc. Hopefully
 members on this list would be persuaded to use or at least clarify their
 terminology.

 While the apocryphal saying goes  the great thing about standards is they
 are so many choose from, we should all make an effort to communicate
 across
 cultures and nations.



 On Mon, Nov 16, 2009 at 5:33 PM, Israel Ekpo israele...@gmail.com wrote:

  On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.org
  wrote:
 
   Probably lakh: 100,000.
  
   So, 900k qpd and 3M docs.
  
   http://en.wikipedia.org/wiki/Lakh
  
   wunder
  
   On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:
  
Hi,
   
Your autoCommit settings are very aggressive.  I'm guessing that's
  what's
   causing the CPU load.
   
btw. what is laks?
   
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
   
   
   
- Original Message 
From: kalidoss kalidoss.muthuramalin...@sifycorp.com
To: solr-user@lucene.apache.org
Sent: Mon, November 16, 2009 9:11:21 AM
Subject: Solr - Load Increasing.
   
Hi All.
   
  My server solr box cpu utilization  increasing b/w 60 to 90% and
  some
   time
solr is getting down and we are restarting it manually.
   
  No of documents in solr 30 laks.
  No of add/update requrest solr 30 thousand / day. Avg of every 30
   minutes
around 500 writes.
  No of search request 9laks / day.
  Size of the data directory: 4gb.
   
   
  My system ram is 8gb.
  System available space 12gb.
  processor Family: Pentium Pro
   
  Our solr data size can be increase in number like 90 laks. and
  writes
   per day
will be around 1laks.   - Hope its possible by solr.
   
  For write commit i have configured like
   
 1
 10
   
   
  Is all above can be possible? 90laks datas and 1laks per day
 writes
   and
30laks per day read??  - if yes what type of system configuration
  would
   require.
   
  Please suggest us.
   
thanks,
Kalidoss.m,
   
   
Get your world in your inbox!
   
Mail, widgets, documents, spreadsheets, organizer and much more with
   your
Sifymail WIYI id!
Log on to http://www.sify.com
   
** DISCLAIMER **
Information contained and transmitted by this E-MAIL is proprietary
 to
   Sify
Limited and is intended for use only by the individual or entity to
   which it is
addressed, and may contain information that is privileged,
  confidential
   or
exempt from disclosure under applicable law. If this is a forwarded
   message, the
content of this E-MAIL may not have been sent with the authority of
  the
   Company.
If you are not the intended recipient, an agent of the intended
   recipient or a
person responsible for delivering the information to the named
   recipient,  you
are notified that any use, distribution, transmission, printing,
  copying
   or
dissemination of this information in any way or in any manner is
   strictly
prohibited. If you have received this communication in error, please
   delete this
mail  notify us immediately at ad...@sifycorp.com
   
  
  
 
 
  Thanks Walter for clarifying that.
 
  I too was wondering what laks meant.
 
  It was a bit distracting when I read the original post.
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
 



Re: exclude some fields from copying dynamic fields | schema.xml

2009-11-16 Thread Lance Norskog
Oh well. There is no direct feature for controlling what is copied.

If you use the DataImportHandler, you can include Java plugins or
Javascript/JRuby/Groovy code to do the copying.

On Sun, Nov 15, 2009 at 9:37 PM, Vicky_Dev
vikrantv_shirbh...@yahoo.co.in wrote:

 Thanks for response

 Defining field is not working :(

 Is there any way to stop copy task for particular set of values

 Thanks
 ~Vikrant



 Lance Norskog-2 wrote:

 There is no direct way.

 Let's say you have a nocopy_s and you do not want a copy
 nocopy_str_s. This might work: declare nocopy_str_s as a field and
 make it not indexed and not stored. I don't know if this will work.

 It requires two overrides to work: 1) that declaring a field name that
 matches a wildcard will override the default wildcard rule, and 2)
 that stored=false indexed=false works.

 On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev
 vikrantv_shirbh...@yahoo.co.in wrote:

 Hi,
 we are using the following entry in schema.xml to make a copy of one type
 of
 dynamic field to another :
 copyField source=*_s dest=*_str_s /

 Is it possible to exclude some fields from copying.

 We are using Solr1.3

 ~Vikrant

 --
 View this message in context:
 http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 Lance Norskog
 goks...@gmail.com



 --
 View this message in context: 
 http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26367099.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: Newbie Solr questions

2009-11-16 Thread yz5od2
thanks, so there is no way to create custom documents/field via the  
SolrJ client  API @ runtime.?



On Nov 16, 2009, at 4:49 PM, Lance Norskog wrote:


here is no way to create custom documents/fields
via the SolrJ client @ runtime.




Re: Newbie Solr questions

2009-11-16 Thread Lance Norskog
Sorry, I did not answer the question. Yes, that's right. SolrJ can
only change the documents in the index. It has no power over the
metadata.


On Mon, Nov 16, 2009 at 4:00 PM, yz5od2 woods5242-outdo...@yahoo.com wrote:
 thanks, so there is no way to create custom documents/field via the SolrJ
 client  API @ runtime.?


 On Nov 16, 2009, at 4:49 PM, Lance Norskog wrote:

 here is no way to create custom documents/fields
 via the SolrJ client @ runtime.





-- 
Lance Norskog
goks...@gmail.com


core size

2009-11-16 Thread Phil Hagelberg

I'm are planning out a system with large indexes and wondering what kind
of performance boost I'd see if I split out documents into many cores
rather than using a single core and splitting by a field. I've got about
500GB worth of indexes ranging from 100MB to 50GB each.

I'm assuming if we split them out to multiple cores we would see the
most dramatic benefit in searches on the smaller cores, but I'm just
wondering what level of speedup I should expect. Eventually the cores
will be split up anyway, I'm just trying to determine how to prioritize
it.

thanks,
Phil


Replication admin page auto-reload

2009-11-16 Thread Jay Hill
The replication admin page on slaves used to have an auto-reload set to
reload every few seconds. In the official 1.4 release this doesn't seem to
be working, but it does in a nightly build from early June. Was this changed
on purpose or is this a bug? I looked through CHANGES.txt to see if anything
was mentioned related to this but didn't see anything. If it's a bug I'll
open an issue in JIRA

-Jay


Re: Some guide about setting up local/geo search at solr

2009-11-16 Thread Otis Gospodnetic
Not that I know.  It's not in contrib, but if you apply that patch from 
http://wiki.apache.org/solr/SpatialSearch I am guessing it puts things in 
contrib/spatial.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Bertie Shen bertie.s...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 12:41:38 PM
 Subject: Re: Some guide about setting up local/geo search at solr
 
 Localsolr is not in contrib yet. I am interested in knowing whether
 currently there is a better solution for setting up a local search.
 
 Cheers.
 
 
 
 On Sun, Nov 15, 2009 at 9:25 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
  Nota bene:
  My understanding is the external versions of Local Lucene/Solr are
  eventually going to be deprecated in favour of what we have in contrib.
   Here's a stub page with a link to the spatial JIRA issue:
  http://wiki.apache.org/solr/SpatialSearch
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
   From: Bertie Shen 
   To: solr-user@lucene.apache.org
   Sent: Sat, November 14, 2009 3:32:01 AM
   Subject: Some guide about setting up local/geo search at solr
  
   Hey,
  
   I spent some times figuring out how to set up local/geo/spatial search at
   solr. I hope the following description can help  given the current
  status.
  
   1) Download localsolr. I download it from
   http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and put
  jar
   file (in my case, localsolr-1.5.jar) in your application's WEB_INF/lib
   directory of application server.
  
   2) Download locallucene. I download it from
   http://sourceforge.net/projects/locallucene/ and put jar file (in my
  case,
   locallucene.jar in locallucene_r2.0/dist/ diectory) in your application's
   WEB_INF/lib directory of application server. I also need to copy
   gt2-referencing-2.3.1.jar, geoapi-nogenerics-2.1-M2.jar, and
  jsr108-0.01.jar
   under locallucene_r2.0/lib/ directory to WEB_INF/lib. Do not copy
   lucene-spatial-2.9.1.jar under Lucene codebase. The namespace has been
   changed from com.pjaol.blah.blah.blah to org.apache.blah blah.
  
   3) Update your solrconfig.xml and schema.xml. I copy it from
   http://www.gissearch.com/localsolr.
  
   4) Restart application server and try a query
   /solr/select?qt=geolat=xx.xxlong=yy.yyq=abcradius=zz.
 
 



Re: core size

2009-11-16 Thread Otis Gospodnetic
If an index fits in memory, I am guessing you'll see the speed change roughly 
proportionally to the size of the index.  If an index does not fit into memory 
(i.e. disk head has to run around the disk to look for info), then the 
improvement will be even greater.  I haven't explicitly tested this and am 
hoping somebody will correct me if this is wrong.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Phil Hagelberg p...@hagelb.org
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 8:42:49 PM
 Subject: core size
 
 
 I'm are planning out a system with large indexes and wondering what kind
 of performance boost I'd see if I split out documents into many cores
 rather than using a single core and splitting by a field. I've got about
 500GB worth of indexes ranging from 100MB to 50GB each.
 
 I'm assuming if we split them out to multiple cores we would see the
 most dramatic benefit in searches on the smaller cores, but I'm just
 wondering what level of speedup I should expect. Eventually the cores
 will be split up anyway, I'm just trying to determine how to prioritize
 it.
 
 thanks,
 Phil



Re: Replication admin page auto-reload

2009-11-16 Thread Erik Hatcher


On Nov 17, 2009, at 2:48 AM, Jay Hill wrote:

The replication admin page on slaves used to have an auto-reload set  
to
reload every few seconds. In the official 1.4 release this doesn't  
seem to
be working, but it does in a nightly build from early June. Was this  
changed
on purpose or is this a bug? I looked through CHANGES.txt to see if  
anything
was mentioned related to this but didn't see anything. If it's a bug  
I'll

open an issue in JIRA


Noble changed this:

~/dev/solr/src/webapp/web/admin/replication: svn log header.jsp

r809125 | noble | 2009-08-29 14:46:54 +0200 (Sat, 29 Aug 2009) | 1 line

automatic refresh is very annoying. The user can do a refresh on his  
browser if required



~/dev/solr/src/webapp/web/admin/replication: svn diff -r800729:809125  
header.jsp

Index: header.jsp
===
--- header.jsp  (revision 800729)
+++ header.jsp  (revision 809125)
@@ -67,12 +67,7 @@

 NamedList namedlist = executeCommand(details,core,rh);
 NamedList detailsMap = (NamedList)namedlist.get(details);
-if(detailsMap != null)
-if(true.equals((String)detailsMap.get(isSlave))){
 %
-   meta http-equiv=refresh content=10/
-%}%
-
 /head

 body