date:20120501

Re: Newbie question on sorting

2012-05-01 Thread Erick Erickson

The easiest way is to do that in the app. That is, return the top
10 to the app (by score) then re-order them there. There's nothing
in Solr that I know of that does what you want out of the box.

Best
Erick

On Mon, Apr 30, 2012 at 11:10 AM, Jacek pjac...@gmail.com wrote:
 Hello all,

 I'm facing this simple problem, yet impossible to resolve for me (I'm a
 newbie in Solr).
 I need to sort the results by score (it is simple, of course), but then
 what I need is to take top 10 results, and re-order it (only those top 10
 results) by a date field.
 It's not the same as sort=score,creationdate

 Any suggestions will be greatly appreciated!

Re: post.jar failing

2012-05-01 Thread Erick Erickson

Works fine for me with address_xml as string type, indexed, stored
on 3.6. What version of Solr are you using?

Best
Erick

On Mon, Apr 30, 2012 at 4:18 PM, William Bell billnb...@gmail.com wrote:
 I am getting a post.jar failure when trying to post the following
 CDATA field... It used to work on older versions. This is in SOlr 3.6.

 add
 doc
  field name=idSP2514N/field
  field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
 GB - ATA-133/field
  field name=manuSamsung Electronics Co. Ltd./field
  field name=catelectronics/field
  field name=cathard drive/field
  field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
  field name=featuresNoiseGuard, SilentSeek technology, Fluid
 Dynamic Bearing (FDB) motor/field
  field name=price92/field
  field name=popularity6/field
  field name=inStocktrue/field
  field name=address_xml![CDATA[poffL
 poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt
 Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff
 /poffL]]/field
  field name=manufacturedate_dt2006-02-13T15:26:37Z/field
  !-- Near Oklahoma city --
  field name=store35.0752,-97.032/field
 /doc

 /add

 Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error 
 adding
 field 'address_xml'='eduL
    edu
        edTypCMEDSCH/edTypC
        inst
            edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm
            yr1974/yr
            degMD/deg
        /inst
    /edu
 /eduL'


 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

Re: core sleep/wake

2012-05-01 Thread Erick Erickson

Well, that'll be kinda self-defeating. The whole point of auto-warming
is to fill up the caches, consuming memory. Without that, searches
will be slow. So the idea of using minimal resources is really
antithetical to having these in-memory structures filled up.

You can try configuring minimal caches  etc. Or just give it
lots of memory and count on your OS to swap the pages out
if the particular core doesn't get used.

Best
Erick

On Mon, Apr 30, 2012 at 5:18 PM, oferiko ofer...@gmail.com wrote:
 I have a multicore solr with a lot of cores that contains a lot of data (~50M
 documents), but are rarely used.
 Can i load a core from configuration, but have keep it in sleep mode, where
 is has all the configuration available, but it hardly consumes resources,
 and based on a query or an update, it will come to life?
 Thanks



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-05-01 Thread Lance Norskog

I've no experience in the language nuances. I've found that I had to
mix unigram phrase searches with free-text searces in bigram fields.
This is for Chinese language, not Japanese. The bigram idea comes
about apparently because Chinese characters tend to be clumped into
2-3 letter words, in a way that is not consistent across different
kinds of text. I have no pretense of understanding the whys.



On Mon, Apr 30, 2012 at 2:21 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Thanks wunder,

 I really appreciate the help.

 Tom




-- 
Lance Norskog
goks...@gmail.com

Re: correct XPATH syntax

2012-05-01 Thread lboutros

Hi David,

I think you should add this option : flatten=true

and the could you try to use this XPath :

/MedlineCitationSet/MedlineCitation/AuthorList/Author

see here for the description :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

I don't think the that the commonField option is needed here, I think you
should suppress it.

Ludovic. 

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing old documents

2012-05-01 Thread mav.p...@holidaylettings.co.uk

Not sure if there is an automatic way but we do it via a delete query and
where possible we update doc under same id to avoid deletes.





On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:

What is the best method to remove old documents?  Things that no generate
404 errors, etc.

Is there an automatic method or do I have to do it manually?

THanks.

Re: should slave replication be turned off / on during master clean and re-index?

2012-05-01 Thread geeky2

hello shawn,

thanks for the reply.

ok - i did some testing and yes you are correct.

autocommit is doing the commit work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.

... and yes - this is probably not what we need - as far as a replication
strategy for the slaves.

you said, you don't use autocommit. if so - then why don't you use / like
autocommit?

since we have not done this here - there is no established reference point,
from an operations perspective.

i am looking to formulate some sort of operation strategy, so ANY ideas or
input is really welcome.

it seems to me that we have to account for two operational strategies -

the first operational mode is a daily append to the solr core after the
database tables have been updated. this can probably be done with a simple
delta import. i would think that autocommit could remain on for the master
and replication could also be left on so the slaves picked up the changes
ASAP. this seems like the mode that we would / should be in most of the
time.

the second operational mode would be a build from scratch mode, where
changes in the schema necessitated a full re-index of the data. given that
our site (powered by solr) must be up all of the time, and that our full
index time on the master (for the moment) is hovering somewhere around 16
hours - it makes sense that some sort of parallel path - with a cut-over,
must be used.

in this situation is it possible to have the indexing process going on in
the background - then have one commit at the end - then turn replication on
for the slaves?

are there disadvantages to this approach?

also - i really like your suggestion of a build core and live core. is
this approach you use?

thank you for all of the great input

then

--
View this message in context:
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: post.jar failing

2012-05-01 Thread Jack Krupansky

Please clarify the problem, because the error message you provide refers to 
address data that is not in the input data that you provide. It doesn't 
match!


The error refers to an edu element, but the input data uses a poff 
element. Maybe you have multiple SP2514N documents; maybe somebody made a 
copy of the original and edited the address_xml field value. And maybe that 
edited version that has an edu element has some obvious error.


In short, show us the full actual input address_xml field element, but 
preferably the entire Solr input document for the version of the SP2514N 
document that actually generates the error .


-- Jack Krupansky

-Original Message- 
From: William Bell

Sent: Monday, April 30, 2012 4:18 PM
To: solr-user@lucene.apache.org
Subject: post.jar failing

I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on older versions. This is in SOlr 3.6.

add
doc
 field name=idSP2514N/field
 field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
 field name=manuSamsung Electronics Co. Ltd./field
 field name=catelectronics/field
 field name=cathard drive/field
 field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
 field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
 field name=price92/field
 field name=popularity6/field
 field name=inStocktrue/field
 field name=address_xml![CDATA[poffL
poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt
Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff
/poffL]]/field
 field name=manufacturedate_dt2006-02-13T15:26:37Z/field
 !-- Near Oklahoma city --
 field name=store35.0752,-97.032/field
/doc

/add

Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error 
adding

field 'address_xml'='eduL
   edu
   edTypCMEDSCH/edTypC
   inst
   edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm
   yr1974/yr
   degMD/deg
   /inst
   /edu
/eduL'


--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Grouping ngroups count

2012-05-01 Thread Francois Perron

Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
ask for top 20K rows, the ngroups property is now at 30K.  

Do you know why and of course how to fix it ?

Thanks.

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

Thank you Jack.

So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML?  and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. p)?

On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.comwrote:

 I was thinking that you wanted to index the actual text from the HTML
 page, but have the stored field value still have the raw HTML with tags. If
 you just want to store only the raw HTML, a simple string field is
 sufficient, but then you can't easily do a text search on it.

 Or, you can have two fields, one string field for the raw HTML (stored,
 but not indexed) and then do a CopyField to a text field field that has the
 HTMLStripCharFilter to strip the HTML tags and index only the text
 (indexed, but not stored.)

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 5:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr: extracting/indexing HTML via cURL

 Great, thank you for the input.  My understanding of HTMLStripCharFilter is
 that it strips HTML tags, which is not what I want ~ is this correct?  I
 want to keep the HTML tags intact.

 On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com
 **wrote:

  If by extracting HTML content via cURL you mean using SolrCell to parse
 html files, this seems to make sense. The sequence is that regardless of
 the file type, each file extraction parser will strip off all formatting
 and produce a raw text stream. Office, PDF, and HTML files are all treated
 the same in that way. Then, the unformatted text stream is sent through
 the
 field type analyzers to be tokenized into terms that Lucene can index. The
 input string to the field type analyzer is what gets stored for the field,
 but this occurs after the extraction file parser has already removed
 formatting.

 No way for the formatting to be preserved in that case, other than to go
 back to the original input document before extraction parsing.

 If you really do want to preserve full HTML formatted text, you would need
 to define a field whose field type uses the HTMLStripCharFilter and then
 directly add documents that direct the raw HTML to that field.

 There may be some other way to hook into the update processing chain, but
 that may be too much effort compared to the HTML strip filter.

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 10:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr: extracting/indexing HTML via cURL


 Hello,

 Over the weekend I experimented with extracting HTML content via cURL and
 just
 wondering why the extraction/indexing process does not include the HTML
 tags.
 It seems as though the HTML tags either being ignored or stripped
 somewhere
 in the pipeline.
 If this is the case, is it possible to include the HTML tags, as I would
 like to keep the
 formatted HTML intact?

 Any help is greatly appreciated.

Re: Solr Merge during off peak times

2012-05-01 Thread Otis Gospodnetic

Hi Prabhu,

I don't think such a merge policy exists, but it would be nice to have this 
option and I imagine it wouldn't be hard to write if you really just base the 
merge or no merge decision on the time of day (and maybe day of the week).

Note that this should go into Lucene, not Solr, so if you decide to contribute 
your work, please see http://wiki.apache.org/lucene-java/HowToContribute

Otis

Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times
 
Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let me 
know if such a merge policy configuration exists?

Thanks
Prabhu

Re: extracting/indexing HTML via cURL

2012-05-01 Thread Jack Krupansky

Sorry for the confusion. It is doable. If you feed the raw HTML into a field 
that has the HTMLStripCharFilter, the stored value will retain the HTML 
tags, while the indexed text will be stripped of the of the tags during 
analysis and be searchable just like a normal text field. Then, search will 
not see p.


-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Tuesday, May 01, 2012 10:08 AM
To: solr-user@lucene.apache.org
Subject: Re: extracting/indexing HTML via cURL

Thank you Jack.

So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML?  and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. p)?

On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky 
j...@basetechnology.comwrote:



I was thinking that you wanted to index the actual text from the HTML
page, but have the stored field value still have the raw HTML with tags. 
If

you just want to store only the raw HTML, a simple string field is
sufficient, but then you can't easily do a text search on it.

Or, you can have two fields, one string field for the raw HTML (stored,
but not indexed) and then do a CopyField to a text field field that has 
the

HTMLStripCharFilter to strip the HTML tags and index only the text
(indexed, but not stored.)

-- Jack Krupansky

-Original Message- From: okayndc
Sent: Monday, April 30, 2012 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr: extracting/indexing HTML via cURL

Great, thank you for the input.  My understanding of HTMLStripCharFilter 
is

that it strips HTML tags, which is not what I want ~ is this correct?  I
want to keep the HTML tags intact.

On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com
**wrote:

 If by extracting HTML content via cURL you mean using SolrCell to parse

html files, this seems to make sense. The sequence is that regardless of
the file type, each file extraction parser will strip off all 
formatting
and produce a raw text stream. Office, PDF, and HTML files are all 
treated

the same in that way. Then, the unformatted text stream is sent through
the
field type analyzers to be tokenized into terms that Lucene can index. 
The
input string to the field type analyzer is what gets stored for the 
field,

but this occurs after the extraction file parser has already removed
formatting.

No way for the formatting to be preserved in that case, other than to go
back to the original input document before extraction parsing.

If you really do want to preserve full HTML formatted text, you would 
need

to define a field whose field type uses the HTMLStripCharFilter and then
directly add documents that direct the raw HTML to that field.

There may be some other way to hook into the update processing chain, but
that may be too much effort compared to the HTML strip filter.

-- Jack Krupansky

-Original Message- From: okayndc
Sent: Monday, April 30, 2012 10:07 AM
To: solr-user@lucene.apache.org
Subject: Solr: extracting/indexing HTML via cURL


Hello,

Over the weekend I experimented with extracting HTML content via cURL and
just
wondering why the extraction/indexing process does not include the HTML
tags.
It seems as though the HTML tags either being ignored or stripped
somewhere
in the pipeline.
If this is the case, is it possible to include the HTML tags, as I would
like to keep the
formatted HTML intact?

Any help is greatly appreciated.

Re: Removing old documents

2012-05-01 Thread Bai Shen

I'm running Nutch, so it's updating the documents, but I'm wanting to
remove ones that are no longer available.  So in that case, there's no
update possible.

On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
mav.p...@holidaylettings.co.uk wrote:

 Not sure if there is an automatic way but we do it via a delete query and
 where possible we update doc under same id to avoid deletes.





 On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:

 What is the best method to remove old documents?  Things that no generate
 404 errors, etc.
 
 Is there an automatic method or do I have to do it manually?
 
 THanks.

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

Awesome, I'll give it try.  Thanks Jack!

On Tue, May 1, 2012 at 10:23 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sorry for the confusion. It is doable. If you feed the raw HTML into a
 field that has the HTMLStripCharFilter, the stored value will retain the
 HTML tags, while the indexed text will be stripped of the of the tags
 during analysis and be searchable just like a normal text field. Then,
 search will not see p.


 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Tuesday, May 01, 2012 10:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: extracting/indexing HTML via cURL


 Thank you Jack.

 So, it's not doable/possible to search and highlight keywords within a
 field that contains the raw formatted HTML?  and strip out the HTML tags
 during analysis...so that a user would get back nothing if they did a
 search for (ex. p)?

 On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  I was thinking that you wanted to index the actual text from the HTML
 page, but have the stored field value still have the raw HTML with tags.
 If
 you just want to store only the raw HTML, a simple string field is
 sufficient, but then you can't easily do a text search on it.

 Or, you can have two fields, one string field for the raw HTML (stored,
 but not indexed) and then do a CopyField to a text field field that has
 the
 HTMLStripCharFilter to strip the HTML tags and index only the text
 (indexed, but not stored.)

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 5:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr: extracting/indexing HTML via cURL

 Great, thank you for the input.  My understanding of HTMLStripCharFilter
 is
 that it strips HTML tags, which is not what I want ~ is this correct?  I
 want to keep the HTML tags intact.

 On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com
 
 **wrote:


  If by extracting HTML content via cURL you mean using SolrCell to parse

 html files, this seems to make sense. The sequence is that regardless of
 the file type, each file extraction parser will strip off all
 formatting
 and produce a raw text stream. Office, PDF, and HTML files are all
 treated
 the same in that way. Then, the unformatted text stream is sent through
 the
 field type analyzers to be tokenized into terms that Lucene can index.
 The
 input string to the field type analyzer is what gets stored for the
 field,
 but this occurs after the extraction file parser has already removed
 formatting.

 No way for the formatting to be preserved in that case, other than to go
 back to the original input document before extraction parsing.

 If you really do want to preserve full HTML formatted text, you would
 need
 to define a field whose field type uses the HTMLStripCharFilter and then
 directly add documents that direct the raw HTML to that field.

 There may be some other way to hook into the update processing chain, but
 that may be too much effort compared to the HTML strip filter.

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 10:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr: extracting/indexing HTML via cURL


 Hello,

 Over the weekend I experimented with extracting HTML content via cURL and
 just
 wondering why the extraction/indexing process does not include the HTML
 tags.
 It seems as though the HTML tags either being ignored or stripped
 somewhere
 in the pipeline.
 If this is the case, is it possible to include the HTML tags, as I would
 like to keep the
 formatted HTML intact?

 Any help is greatly appreciated.

Re: Removing old documents

2012-05-01 Thread Markus Jelsma

Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.
 
 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
 
 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
  
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

-- 
Markus Jelsma - CTO - Openindex

Logging from data-config.xml

2012-05-01 Thread Twomey, David



I'm getting this error (below) when doing an import.   I'd like to add a
Log line so I can see if the file path is messed up.

So my data-config.xml looks like below but I'm not getting any extra info
in the solr.log file under jetty.  Is there a way to log to this log file
from data-import.xml?

dataConfig
dataSource type=FileDataSource /
document

  entity name=medlineFileList
  processor=FileListEntityProcessor
  fileName=.*xml
  rootEntity=false
 dataSource=null 
  baseDir=/index_files/pubmed/

  entity name=medlineFiles processor=XPathEntityProcessor
url=${medlineFileList.fileAblsolutePath}
forEach=/MedlineCitationSet/MedlineCitation
  
 transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Lo
gTransformer
logTemplate=   processing
${medlineFileList.fileAbsolutePath}
logLevel=info
stream=true

field column=pmid
xpath=/MedlineCitationSet/MedlineCitation/PMID   commonField=true /
  ...


Thanks.


INFO: Starting Full Import
May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: medlineFileList document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: 
 at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(
DataImportHandlerException.java:64)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti
tyProcessor.java:286)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathE
ntityProcessor.java:224)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntity
Processor.java:204)
 at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPro
cessorWrapper.java:238)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java
:591)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java
:617)
 at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:26
7)
 at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
 at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.j
ava:353)
 at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:41
1)
 at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392
)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja
va:113)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja
va:85)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja
va:47)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti
tyProcessor.java:283)
 ... 10 more
Caused by: java.io.FileNotFoundException: Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja
va:111)
 ... 13 more

Re: Removing old documents

2012-05-01 Thread mav.p...@holidaylettings.co.uk

Hi

What I do is I put the date created for when the doc was inserted or
updated and then I do a search/delete query based on that

Mav



On 01/05/2012 15:31, Bai Shen baishen.li...@gmail.com wrote:

I'm running Nutch, so it's updating the documents, but I'm wanting to
remove ones that are no longer available.  So in that case, there's no
update possible.

On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
mav.p...@holidaylettings.co.uk wrote:

 Not sure if there is an automatic way but we do it via a delete query
and
 where possible we update doc under same id to avoid deletes.





 On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:

 What is the best method to remove old documents?  Things that no
generate
 404 errors, etc.
 
 Is there an automatic method or do I have to do it manually?
 
 THanks.

Re: Logging from data-config.xml

2012-05-01 Thread Twomey, David

 fixed the error, stupid typo, but log msg didn't appear until typo was
fixed.  I would have thought they would be unrelated.


On 5/1/12 10:42 AM, Twomey, David david.two...@novartis.com wrote:



I'm getting this error (below) when doing an import.   I'd like to add a
Log line so I can see if the file path is messed up.

So my data-config.xml looks like below but I'm not getting any extra info
in the solr.log file under jetty.  Is there a way to log to this log file
from data-import.xml?

dataConfig
dataSource type=FileDataSource /
document

  entity name=medlineFileList
  processor=FileListEntityProcessor
  fileName=.*xml
  rootEntity=false
 dataSource=null
  baseDir=/index_files/pubmed/

  entity name=medlineFiles processor=XPathEntityProcessor
url=${medlineFileList.fileAblsolutePath}
forEach=/MedlineCitationSet/MedlineCitation
  
 
transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Lo
gTransformer
logTemplate=   processing
${medlineFileList.fileAbsolutePath}
logLevel=info
stream=true

field column=pmid
xpath=/MedlineCitationSet/MedlineCitation/PMID   commonField=true /
  ...


Thanks.


INFO: Starting Full Import
May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: medlineFileList document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: 
 at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow
(
DataImportHandlerException.java:64)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt
i
tyProcessor.java:286)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPath
E
ntityProcessor.java:224)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntit
y
Processor.java:204)
 at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr
o
cessorWrapper.java:238)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
a
:591)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
a
:617)
 at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2
6
7)
 at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
 at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.
j
ava:353)
 at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4
1
1)
 at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:39
2
)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j
a
va:113)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j
a
va:85)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j
a
va:47)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt
i
tyProcessor.java:283)
 ... 10 more
Caused by: java.io.FileNotFoundException: Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j
a
va:111)
 ... 13 more

Re: get a total count

2012-05-01 Thread Rahul R

Hello,
A related question on this topic. How do I programmatically find the total
number of documents across many shards ? For EmbeddedSolrServer, I use the
following command to get the total count :
solrSearcher.getStatistics().get(numDocs)

With distributed search, how do i get the count of all records in all
shards. Apart from doing a *:* query, is there a way to get the total count
? I am not able to use the same command above because, I am not able to get
a handle to the SolrIndexSearcher object with distributed search. The conf
and data directories of my index reside directly under a folder called solr
(no core) under the weblogic domain directly. I dont have a SolrCore
object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object
using the following call :
solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get();

Stack Information :
OS : Solaris
jdk : 1.5.0_14 32 bit
Solr : 1.3
App Server : Weblogic 10MP1

Thank you.

- Rahul

On Tue, Nov 15, 2011 at 10:49 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 I'm assuming the question was about how MANY documents have been indexed
 across all shards.

 Answer #1:
 Look at the Solr Admin Stats page on each of your Solr instances and add
 up the numDocs numbers you see there

 Answer #2:
 Use Sematext's free Performance Monitoring tool for Solr
 On Index report choose all, sum in the Solr Host selector and that will
 show you the total # of docs across the cluster, total # of deleted docs,
 total segments, total size on disk, etc.
 URL: http://www.sematext.com/spm/solr-performance-monitoring/index.html

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: U Anonym uano...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 14, 2011 11:50 AM
 Subject: get a total count
 
 Hello everyone,
 
 A newbie question:  how do I find out how documents have been indexed
 across all shards?
 
 Thanks much!

Re: get latest 50 documents the fastest way

2012-05-01 Thread Otis Gospodnetic

Hi,

The first thing that comes to mind is to not query with *:*, which I'm guessing 
you are doing, but by running a query with a time range constraint that you 
know will return you enough docs, but not so many that performance suffers.

And, of course, thinking beyond Solr, if you really know you always need last 
50, you could simply keep last 50 in memory somewhere and get it from there, 
not from Solr, which should be faster.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Yuval Dotan yuvaldo...@gmail.com
To: solr-user solr-user@lucene.apache.org 
Sent: Tuesday, May 1, 2012 10:38 AM
Subject: get latest 50 documents the fastest way
 
Hi Guys
We have a use case where we need to get the 50 *latest *documents that
match my query - without additional ranking,sorting,etc on the results.
My index contains 1,000,000,000 documents and i noticed that if the number
of found documents is very big (larger than 50% of the index size -
500,000,000 docs) than it takes more than 5 seconds to get the results even
with rows=50 parameter.
Is there a way to get the results faster?
Thanks
Yuval

Re: post.jar failing

2012-05-01 Thread William Bell

OK. I am using SOLR 3.6.

I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.

We might want to add a test case for CDATA.

add
doc
  field name=idSP2514N/field
  field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
  field name=manuSamsung Electronics Co. Ltd./field
  field name=catelectronics/field
  field name=cathard drive/field
  field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
  field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
  field name=price92/field
  field name=popularity6/field
  field name=inStocktrue/field
  field name=address_xml![CDATA[eduL
edu
edTypCMEDSCH/edTypC
inst
edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm
yr1974/yr
degMD/deg
/inst
/edu
/eduL]]/field
  field name=manufacturedate_dt2006-02-13T15:26:37Z/field
  !-- Near Oklahoma city --
  field name=store35.0752,-97.032/field
/doc

/add



On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com wrote:
 Please clarify the problem, because the error message you provide refers to
 address data that is not in the input data that you provide. It doesn't
 match!

 The error refers to an edu element, but the input data uses a poff
 element. Maybe you have multiple SP2514N documents; maybe somebody made a
 copy of the original and edited the address_xml field value. And maybe that
 edited version that has an edu element has some obvious error.

 In short, show us the full actual input address_xml field element, but
 preferably the entire Solr input document for the version of the SP2514N
 document that actually generates the error .

 -- Jack Krupansky

 -Original Message- From: William Bell
 Sent: Monday, April 30, 2012 4:18 PM
 To: solr-user@lucene.apache.org
 Subject: post.jar failing


 I am getting a post.jar failure when trying to post the following
 CDATA field... It used to work on older versions. This is in SOlr 3.6.

 add
 doc
  field name=idSP2514N/field
  field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
 GB - ATA-133/field
  field name=manuSamsung Electronics Co. Ltd./field
  field name=catelectronics/field
  field name=cathard drive/field
  field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
  field name=featuresNoiseGuard, SilentSeek technology, Fluid
 Dynamic Bearing (FDB) motor/field
  field name=price92/field
  field name=popularity6/field
  field name=inStocktrue/field
  field name=address_xml![CDATA[poffL
 poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt
 Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff
 /poffL]]/field
  field name=manufacturedate_dt2006-02-13T15:26:37Z/field
  !-- Near Oklahoma city --
  field name=store35.0752,-97.032/field
 /doc

 /add

 Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
 adding
 field 'address_xml'='eduL
   edu
       edTypCMEDSCH/edTypC
       inst
           edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm
           yr1974/yr
           degMD/deg
       /inst
   /edu
 /eduL'


 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

CoreContainer.java, in the method 'load', finds itself calling
loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
'Log4j'.

e.g.:

2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
to load LogWatcher
org.apache.solr.common.SolrException: Error loading class 'Log4j'

What is it actually looking for? Have I misplaced something?

Re: solr error after relacing schema.xml

2012-05-01 Thread BillB1951

PROBLEM RESOLVED.

Solr 3.6.0 changed where it looks for stopwords_en.txt (now in sub-directory
/lang) .  Schema.xml generated by Haystack 2.0.0 beta need to be edited. 
Everthing working now.

-
BillB1951
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3953115.html
Sent from the Solr - User mailing list archive at Nabble.com.

question on word parsing control

2012-05-01 Thread kenf_nc

I have a field that is defined using what I believe is fairly standard text
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: post.jar failing

2012-05-01 Thread Jack Krupansky

Sounds as if maybe it was some other kind of error having nothing to do with 
the data itself. Were there any additional errors or exceptions shortly 
before the failure? Maybe memory was low and some component wouldn't load, 
or somebody caught an exception without reporting the actual cause. After 
all, the message you provided said nothing about the actual problem. Maybe 
Solr itself needs a better diagnostic in that case.


-- Jack Krupansky

-Original Message- 
From: William Bell

Sent: Tuesday, May 01, 2012 11:09 AM
To: solr-user@lucene.apache.org
Subject: Re: post.jar failing

OK. I am using SOLR 3.6.

I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.

We might want to add a test case for CDATA.

add
doc
 field name=idSP2514N/field
 field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
 field name=manuSamsung Electronics Co. Ltd./field
 field name=catelectronics/field
 field name=cathard drive/field
 field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
 field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
 field name=price92/field
 field name=popularity6/field
 field name=inStocktrue/field
 field name=address_xml![CDATA[eduL
   edu
   edTypCMEDSCH/edTypC
   inst
   edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm
   yr1974/yr
   degMD/deg
   /inst
   /edu
/eduL]]/field
 field name=manufacturedate_dt2006-02-13T15:26:37Z/field
 !-- Near Oklahoma city --
 field name=store35.0752,-97.032/field
/doc

/add



On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com 
wrote:
Please clarify the problem, because the error message you provide refers 
to

address data that is not in the input data that you provide. It doesn't
match!

The error refers to an edu element, but the input data uses a poff
element. Maybe you have multiple SP2514N documents; maybe somebody made 
a
copy of the original and edited the address_xml field value. And maybe 
that

edited version that has an edu element has some obvious error.

In short, show us the full actual input address_xml field element, but
preferably the entire Solr input document for the version of the SP2514N
document that actually generates the error .

-- Jack Krupansky

-Original Message- From: William Bell
Sent: Monday, April 30, 2012 4:18 PM
To: solr-user@lucene.apache.org
Subject: post.jar failing


I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on older versions. This is in SOlr 3.6.

add
doc
 field name=idSP2514N/field
 field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
 field name=manuSamsung Electronics Co. Ltd./field
 field name=catelectronics/field
 field name=cathard drive/field
 field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
 field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
 field name=price92/field
 field name=popularity6/field
 field name=inStocktrue/field
 field name=address_xml![CDATA[poffL
poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt
Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff
/poffL]]/field
 field name=manufacturedate_dt2006-02-13T15:26:37Z/field
 !-- Near Oklahoma city --
 field name=store35.0752,-97.032/field
/doc

/add

Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
adding
field 'address_xml'='eduL
  edu
  edTypCMEDSCH/edTypC
  inst
  edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm
  yr1974/yr
  degMD/deg
  /inst
  /edu
/eduL'


--
Bill Bell
billnb...@gmail.com
cell 720-256-8076




--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: question on word parsing control

2012-05-01 Thread Jack Krupansky

This is a stemming artifact, that all of the forms of evaluat* are being 
stemmed to evalu. That may seem odd, but stemming/stemmers are odd to 
begin with.


1. You could choose a different stemmer.
2. You could add synonyms to map various forms of the word to the desired 
form, such as eval.

3. Accept that Solr ain't perfect or optimal for every fine detail.
4. Or, maybe the stemmer behavior is technically perfect, but perfection 
can be subjective.


In this particular case, maybe you might consider a synonym rule such as 
eval=evaluate.


-- Jack Krupansky

-Original Message- 
From: kenf_nc

Sent: Tuesday, May 01, 2012 9:23 AM
To: solr-user@lucene.apache.org
Subject: question on word parsing control

I have a field that is defined using what I believe is fairly standard 
text

fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Mark Miller

There is a recent JIRA issue about keeping the last n logs to display in the 
admin UI.

That introduced a problem - and then the fix introduced a problem - and then 
the fix mitigated the problem but left that ugly logging as a by product.

Don't remember the issue # offhand. I think there was a dispute about what 
should be done with it.

On May 1, 2012, at 11:14 AM, Benson Margulies wrote:

 CoreContainer.java, in the method 'load', finds itself calling
 loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
 'Log4j'.
 
 e.g.:
 
 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
 to load LogWatcher
 org.apache.solr.common.SolrException: Error loading class 'Log4j'
 
 What is it actually looking for? Have I misplaced something?

- Mark Miller
lucidimagination.com

Email classification with solr

2012-05-01 Thread Ramo Karahasan

Hello,

 

just a short question:

 

Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?

 

 

Thanks,

Ramo

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:
 There is a recent JIRA issue about keeping the last n logs to display in the 
 admin UI.

 That introduced a problem - and then the fix introduced a problem - and then 
 the fix mitigated the problem but left that ugly logging as a by product.

 Don't remember the issue # offhand. I think there was a dispute about what 
 should be done with it.

 On May 1, 2012, at 11:14 AM, Benson Margulies wrote:

 CoreContainer.java, in the method 'load', finds itself calling
 loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
 'Log4j'.

Couldn't someone just fix the if statement to say, 'OK, if we're doing
log4j, we have no log watcher' and skip all the loud failing on the
way?




 e.g.:

 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
 to load LogWatcher
 org.apache.solr.common.SolrException: Error loading class 'Log4j'

 What is it actually looking for? Have I misplaced something?

 - Mark Miller
 lucidimagination.com

Re: hierarchical faceting?

2012-05-01 Thread sam ”

yup.

fieldType name=cq_tag class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.PathHierarchyTokenizerFactory
delimiter=$/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
/fieldType

field name=colors type=cq_tag  indexed=true
stored=true multiValued=true/!-- red$pink, blue ... --
field name=colors_facet  type=string  indexed=true
stored=false multiValued=true/!-- red$pink, blue ... --
copyField  source=colors dest=colors_facet/

and ?facet.field=colors_facet



On Mon, Apr 30, 2012 at 9:35 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Is there a tokenizer that tokenizes the string as one token?

 Using KeywordTokenizer at query time should do whta you want.


 -Hoss

RE: Grouping ngroups count

2012-05-01 Thread Young, Cody

Hello,

When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed 
query?

If you're doing a distributed query, then for group.ngroups to work you need to 
ensure that all documents for a group exist on a single shard.

However, what you're describing sounds an awful lot like this JIRA issue that I 
entered a while ago for distributed grouping. I found that the hit count was 
coming only from the shards that ended up having results in the documents that 
were returned. I didn't test group.ngroups at the time.

https://issues.apache.org/jira/browse/SOLR-3316

If this is a similar issue then you should make a new Jira issue.

Cody

-Original Message-
From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
Sent: Tuesday, May 01, 2012 6:47 AM
To: solr-user@lucene.apache.org
Subject: Grouping ngroups count

Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
ask for top 20K rows, the ngroups property is now at 30K.  

Do you know why and of course how to fix it ?

Thanks.

Re: Solr Parent/Child Searching

2012-05-01 Thread Mikhail Khludnev

Hello Simon,

Let me reply to solr-user. We consider BJQ as a promising solution for
parent/child usecase, we have a facet component prototype for it; but it's
too raw and my team had to switch to another challenges temporarily.
I participated in SOLR-3076, but achievement is really modest. I've
attached essential BJQParser with god-mode syntax. I think the next stage
should be a block indexing support in Solr, I'm not sure how to do that
right. I suppose that by next month I'll be able to provide something like
essential support for block updates.

Regards

On Tue, May 1, 2012 at 12:05 AM, Simon Guindon simon.guindon wrote:

  Hello Mikhail,

 ** **

 I came across your blog post about Solr with an alternative approach to
 the block join solution for LUCENE-3171. We have hit the same situation
 where we need the parent/child relationship for our Solr queries.

 ** **

 I was wondering if your solution was available anywhere? It would be nice
 if a solution could make its way into Solr at some point J

 ** **

 Thanks and take care,

 Simon Guindon




-- 
Sincerely yours
Mikhail Khludnev.
Tech Lead,
Grid Dynamics.

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: post.jar failing

2012-05-01 Thread William Bell

I am not sure. It just started working.

On Tue, May 1, 2012 at 9:39 AM, Jack Krupansky j...@basetechnology.com wrote:
Sounds as if maybe it was some other kind of error having nothing to do with
the data itself. Were there any additional errors or exceptions shortly
before the failure? Maybe memory was low and some component wouldn't load,
or somebody caught an exception without reporting the actual cause. After
all, the message you provided said nothing about the actual problem. Maybe
Solr itself needs a better diagnostic in that case.

-- Jack Krupansky

-Original Message- From: William Bell
Sent: Tuesday, May 01, 2012 11:09 AM
To: solr-user@lucene.apache.org
Subject: Re: post.jar failing

OK. I am using SOLR 3.6.

I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.

We might want to add a test case for CDATA.

add
doc
field name=idSP2514N/field
field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
field name=manuSamsung Electronics Co. Ltd./field
field name=catelectronics/field
field name=cathard drive/field
field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
field name=price92/field
field name=popularity6/field
field name=inStocktrue/field
field name=address_xml![CDATA[eduL
edu
edTypCMEDSCH/edTypC
inst
edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm
yr1974/yr
degMD/deg
/inst
/edu
/eduL]]/field
field name=manufacturedate_dt2006-02-13T15:26:37Z/field
!-- Near Oklahoma city --
field name=store35.0752,-97.032/field
/doc

/add

On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com
wrote:

Please clarify the problem, because the error message you provide refers
to
address data that is not in the input data that you provide. It doesn't
match!

The error refers to an edu element, but the input data uses a poff
element. Maybe you have multiple SP2514N documents; maybe somebody made
a
copy of the original and edited the address_xml field value. And maybe
that
edited version that has an edu element has some obvious error.

In short, show us the full actual input address_xml field element, but
preferably the entire Solr input document for the version of the SP2514N
document that actually generates the error .

-- Jack Krupansky

-Original Message- From: William Bell
Sent: Monday, April 30, 2012 4:18 PM
To: solr-user@lucene.apache.org
Subject: post.jar failing

I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on older versions. This is in SOlr 3.6.

add
doc
field name=idSP2514N/field
field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133/field
field name=manuSamsung Electronics Co. Ltd./field
field name=catelectronics/field
field name=cathard drive/field
field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
field name=featuresNoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor/field
field name=price92/field
field name=popularity6/field
field name=inStocktrue/field
field name=address_xml![CDATA[poffL
poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt

Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff
/poffL]]/field
field name=manufacturedate_dt2006-02-13T15:26:37Z/field
!-- Near Oklahoma city --
field name=store35.0752,-97.032/field
/doc

/add

Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
adding
field 'address_xml'='eduL
edu
edTypCMEDSCH/edTypC
inst
edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm
yr1974/yr
degMD/deg
/inst
/edu
/eduL'

--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Shanmugavel SRD

Hi,
  Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?
Thanks,
Shanmugavel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-integrate-sen-and-lucene-ja-in-SOLR-3-x-tp3953266.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Email classification with solr

2012-05-01 Thread Jack Krupansky

There are a number of different routes you can go, one of which is to use 
SolrCell (Tika) to parse mbox files and then add your own update processor 
that does whatever mail classification analysis you desire and then 
generates addition field values for the classification.


A simpler approach is to do the analysis yourself outside of Solr and then 
feed the mbox data for each message into SolrCell along with the specific 
literal field values derived from your classification analysis. SolrCell 
(Tika) would then parse the mail message and add your literal field values.


Or, you may want to consider fully parsing the mail messages outside of Solr 
so that you have full control over what gets parsed and which schema fields 
are used or not used, in additional to your content analysis field values.


-- Jack Krupansky

-Original Message- 
From: Ramo Karahasan

Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

Re: Upgrading to 3.6 broke cachedsqlentityprocessor

2012-05-01 Thread Mikhail Khludnev

I know about one regression at least. Fix is already committed. see
https://issues.apache.org/jira/browse/SOLR-3360

On Tue, May 1, 2012 at 12:53 AM, Brent Mills bmi...@uship.com wrote:

 I've read some things in jira on the new functionality that was put into
 caching in the DIH but I wouldn't think it should break the old behavior.
  It doesn't look as though any errors are being thrown, it's just ignoring
 the caching part and opening a ton of connections.  Also I cannot find any
 documentation on the new functionality that was added so I'm not sure what
 syntax is valid and what's not.  Here is my entity that worked in 3.1 but
 no longer works in 3.6:

 entity name=Emails query=SELECT * FROM Account.SolrUserSearchEmails
 WHERE '${dataimporter.request.clean}' != 'false' OR DateUpdated =
 dateadd(ss, -30, '${dataimporter.last_index_time}')
 processor=CachedSqlEntityProcessor where=UserID=Users.UserID




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Does Solr fit my needs?

2012-05-01 Thread Mikhail Khludnev

no problem - you are welcome.
Nothing out-of-the-box yet. Only approach is ready

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
https://issues.apache.org/jira/browse/SOLR-3076

Regards

On Mon, Apr 30, 2012 at 12:06 PM, G.Long jde...@gmail.com wrote:

 Hi :)

 Thank you all for your answers. I'll try these solutions :)

 Kind regards,

 Gary

 Le 27/04/2012 16:31, G.Long a écrit :

 Hi there :)

 I'm looking for a way to save xml files into some sort of database and
 i'm wondering if Solr would fit my needs.
 The xml files I want to save have a lot of child nodes which also contain
 child nodes with multiple values. The depth level can be more than 10.

 After having indexed the files, I would like to be able to query for
 subparts of those xml files and be able to reconstruct them as xml files
 with all their children included. However, I'm wondering if it is possible
 with an index like solr lucene to keep or easily recover the structure of
 my xml data?

 Thanks for your help,

 Regards,

 Gary





-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

AW: Email classification with solr

2012-05-01 Thread Ramo Karahasan

Hi Jack,

thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.

Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three categories: newsletter is on of this
category. I would like to classify incoming emails as newsletter, if they
fulfill an amount of attributes, e.g. the email address of the sender
comprised newsletter and variants of this word in the address AND a
newsletter content (body) should be classified as an newsletter.

Is that possible to do that just with solr? Or do I need another tools for
classifiying on the basis of text analysis? Isn't it necessary to build up a
taxonomy for newsletter emails so that the classifier can match the mail
text with some ruleset (defined taxonomy)?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Dienstag, 1. Mai 2012 18:49
An: solr-user@lucene.apache.org
Betreff: Re: Email classification with solr

There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.

A simpler approach is to do the analysis yourself outside of Solr and then
feed the mbox data for each message into SolrCell along with the specific
literal field values derived from your classification analysis. SolrCell
(Tika) would then parse the mail message and add your literal field values.

Or, you may want to consider fully parsing the mail messages outside of Solr
so that you have full control over what gets parsed and which schema fields
are used or not used, in additional to your content analysis field values.

-- Jack Krupansky

-Original Message-
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Gopal Patwa

I have similar issue using log4j for logging with trunk build, the
CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
using sjfj 1.5.2

10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
java.lang.NoSuchMethodError:
org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)



On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote:

 On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com
 wrote:
  There is a recent JIRA issue about keeping the last n logs to display in
 the admin UI.
 
  That introduced a problem - and then the fix introduced a problem - and
 then the fix mitigated the problem but left that ugly logging as a by
 product.
 
  Don't remember the issue # offhand. I think there was a dispute about
 what should be done with it.
 
  On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
 
  CoreContainer.java, in the method 'load', finds itself calling
  loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
  'Log4j'.

 Couldn't someone just fix the if statement to say, 'OK, if we're doing
 log4j, we have no log watcher' and skip all the loud failing on the
 way?



 
  e.g.:
 
  2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
  to load LogWatcher
  org.apache.solr.common.SolrException: Error loading class 'Log4j'
 
  What is it actually looking for? Have I misplaced something?
 
  - Mark Miller
  lucidimagination.com

Re: AW: Email classification with solr

2012-05-01 Thread Jack Krupansky

If you have the code that does all of that analysis, then you could 
integrate it with Solr using one of the approaches I listed, but Solr itself 
would not provide any of that analysis.


-- Jack Krupansky

-Original Message- 
From: Ramo Karahasan

Sent: Tuesday, May 01, 2012 1:14 PM
To: solr-user@lucene.apache.org
Subject: AW: Email classification with solr

Hi Jack,

thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.

Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three categories: newsletter is on of this
category. I would like to classify incoming emails as newsletter, if they
fulfill an amount of attributes, e.g. the email address of the sender
comprised newsletter and variants of this word in the address AND a
newsletter content (body) should be classified as an newsletter.

Is that possible to do that just with solr? Or do I need another tools for
classifiying on the basis of text analysis? Isn't it necessary to build up a
taxonomy for newsletter emails so that the classifier can match the mail
text with some ruleset (defined taxonomy)?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Dienstag, 1. Mai 2012 18:49
An: solr-user@lucene.apache.org
Betreff: Re: Email classification with solr

There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.

A simpler approach is to do the analysis yourself outside of Solr and then
feed the mbox data for each message into SolrCell along with the specific
literal field values derived from your classification analysis. SolrCell
(Tika) would then parse the mail message and add your literal field values.

Or, you may want to consider fully parsing the mail messages outside of Solr
so that you have full control over what gets parsed and which schema fields
are used or not used, in additional to your content analysis field values.

-- Jack Krupansky

-Original Message-
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread geeky2

Hello all,

is there a notification / trigger / callback mechanism people use that
allows them to know when a dataimport process has finished?

we will be doing daily delta-imports and i need some way for an operations
group to know when the DIH has finished.

thank you,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-handler-DIH-notify-when-it-has-finished-tp3953339.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to expand list into multi-valued fields?

2012-05-01 Thread invisbl

I am indexing content from a RDBMS. I have a column in a table with pipe
separated values, and upon indexing I would like to transform these values
into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Grouping ngroups count

2012-05-01 Thread Francois Perron

Thanks for your response Cody,

  First, I used distributed grouping on 2 shards and I'm sure then all 
documents of each group are in the same shard.  

I take a look on JIRA issue and it seem really similar.  There is the same 
problem with group.ngroups.  The count is calculated in second pass so we only 
had result from useful shards and it's why when I increase rows limit i got 
the right count (they must use all my shards).

Except it's a feature (i hope not), I will create a new JIRA issue for this.

Thanks

On 2012-05-01, at 12:32 PM, Young, Cody wrote:

 Hello,
 
 When you say 2 slices, do you mean 2 shards? As in, you're doing a 
 distributed query?
 
 If you're doing a distributed query, then for group.ngroups to work you need 
 to ensure that all documents for a group exist on a single shard.
 
 However, what you're describing sounds an awful lot like this JIRA issue that 
 I entered a while ago for distributed grouping. I found that the hit count 
 was coming only from the shards that ended up having results in the documents 
 that were returned. I didn't test group.ngroups at the time.
 
 https://issues.apache.org/jira/browse/SOLR-3316
 
 If this is a similar issue then you should make a new Jira issue.
 
 Cody
 
 -Original Message-
 From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
 Sent: Tuesday, May 01, 2012 6:47 AM
 To: solr-user@lucene.apache.org
 Subject: Grouping ngroups count
 
 Hello all,
 
  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
 ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
 ask for top 20K rows, the ngroups property is now at 30K.  
 
 Do you know why and of course how to fix it ?
 
 Thanks.

Re: How to expand list into multi-valued fields?

2012-05-01 Thread Jeevanandam


here you go

specify regex transformer in entity tag of DIH config xml like below

entity 
transformer=RegexTransformer ... /

and then

field column=ColumnA name=FruitField splitBy=\| /

That's it!

- Jeevanandam


On 02-05-2012 12:35 am, invisbl wrote:
I am indexing content from a RDBMS. I have a column in a table with 
pipe
separated values, and upon indexing I would like to transform these 
values

into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context:

http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing old documents

2012-05-01 Thread alxsss

Hello,

I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

without and with -noCommit  and restarted solr server

Log  shows that 5 documents were removed but they are still in the search 
results.
Is this a bug or something is missing?
I use nutch-1.4 and solr 3.5

Thanks.
Alex. 

 

 

 

-Original Message-
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 7:41 am
Subject: Re: Removing old documents


Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.
 
 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
 
 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
  
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

-- 
Markus Jelsma - CTO - Openindex

Re: correct XPATH syntax

2012-05-01 Thread Twomey, David

Ludovic,

Thanks for your help.  I tried your suggestion but it didn't work for
Authors.  Below are 3 snippets from data-config.xml, the XML file and the
XML response from the DB

Data-config:
 entity name=medlineFiles processor=XPathEntityProcessor
url=${medlineFileList.fileAbsolutePath}
forEach=/MedlineCitationSet/MedlineCitation

transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Log
Transformer
logTemplate=   processing
${medlineFileList.fileAbsolutePath}
logLevel=info
flatten=true
stream=true

field column=pmid
xpath=/MedlineCitationSet/MedlineCitation/PMID   commonField=true /
field column=journal_name
xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/Title
commonField=true /
field column=title
xpath=/MedlineCitationSet/MedlineCitation/Article/ArticleTitle
commonField=true /
field column=abstract
xpath=/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText
 commonField=true /
field column=author
xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author
commonField=false /
field column=year
xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub
Date/Year   commonField=true /

  /entity



XML Snippet for Author:
AuthorList CompleteYN=Y
 Author ValidYN=Y
  LastNameMalathi/LastName
  ForeNameK/ForeName
  InitialsK/Initials
 /Author
 Author ValidYN=Y
  LastNameXiao/LastName
  ForeNameY/ForeName
  InitialsY/Initials
 /Author
 Author ValidYN=Y
  LastNameMitchell/LastName
  ForeNameA P/ForeName
  InitialsAP/Initials
 /Author
/AuthorList


Response from SOLR:

arr name=author
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
/arr
str name=journal_nameJournal of cancer research and clinical
oncology/str




Thanks
David

On 5/1/12 8:05 AM, lboutros boutr...@gmail.com wrote:

Hi David,

I think you should add this option : flatten=true

and the could you try to use this XPath :

/MedlineCitationSet/MedlineCitation/AuthorList/Author

see here for the description :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config
.xml-1

I don't think the that the commonField option is needed here, I think you
should suppress it.

Ludovic. 

-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.
html
Sent from the Solr - User mailing list archive at Nabble.com.

question on tokenization control

2012-05-01 Thread kfdroid

I have a field that is defined using what I believe is fairly standard text
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match? 

Thanks, 
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread Gora Mohanty

On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote:
 Hello all,

 is there a notification / trigger / callback mechanism people use that
 allows them to know when a dataimport process has finished?

 we will be doing daily delta-imports and i need some way for an operations
 group to know when the DIH has finished.


Never tried it myself, but this should meet your needs:
http://wiki.apache.org/solr/DataImportHandler#EventListeners

Regards,
Gora

Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

Is there a way to boost documents based on the search term/phrase?

Re: core sleep/wake

2012-05-01 Thread Ofer Fort

My random searches can be a bit slow on startup, so i still would like to
get that lazy load but have more cores available.
I'm actually trying now the LotsOfCores way of handling things.
Had to work a bit to get the patch suitable for 3.5 but it seems to be
doing what i need.


On Tue, May 1, 2012 at 2:31 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, that'll be kinda self-defeating. The whole point of auto-warming
 is to fill up the caches, consuming memory. Without that, searches
 will be slow. So the idea of using minimal resources is really
 antithetical to having these in-memory structures filled up.

 You can try configuring minimal caches  etc. Or just give it
 lots of memory and count on your OS to swap the pages out
 if the particular core doesn't get used.

 Best
 Erick

 On Mon, Apr 30, 2012 at 5:18 PM, oferiko ofer...@gmail.com wrote:
  I have a multicore solr with a lot of cores that contains a lot of data
 (~50M
  documents), but are rarely used.
  Can i load a core from configuration, but have keep it in sleep mode,
 where
  is has all the configuration available, but it hardly consumes resources,
  and based on a query or an update, it will come to life?
  Thanks
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jack Krupansky


Do you mean besides query elevation?

http://wiki.apache.org/solr/QueryElevationComponent

And besides explicit boosting by the user (the ^ suffix operator after a 
term/phrase)?


-- Jack Krupansky

-Original Message- 
From: Donald Organ

Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: question on tokenization control

2012-05-01 Thread Dan Tuffery

Hi,

Is that an indexing setting or query setting that will tokenize 'evalu'
but not 'eval'?

Without seeing the tokenizers you're using for the field type it's hard to
say. You can use Solr's analysis page to see the tokens that are generated
by the tokenizers in your analysis chain at both query time and index time.

http://localhost:8983/solr/admin/analysis.jsp

how do I get 'eval' to be a match?

You could use synonyms to map 'eval' to 'evaluation'.

Dan

On Tue, May 1, 2012 at 8:17 PM, kfdroid kfdr...@gmail.com wrote:

 I have a field that is defined using what I believe is fairly standard
 text
 fieldType. I have documents with the words 'evaluate', 'evaluating',
 'evaluation' in them. When I search on the whole word, obviously it works,
 if I search on 'eval' it finds nothing. However for some reason if I search
 on 'evalu' it finds all the matches.  Is that an indexing setting or query
 setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval'
 to
 be a match?

 Thanks,
 Ken

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: question on tokenization control

2012-05-01 Thread Walter Underwood

Use synonyms at index time. Make eval and evaluate equivalent words.

wunder

On May 1, 2012, at 1:31 PM, Dan Tuffery wrote:

 Hi,
 
 Is that an indexing setting or query setting that will tokenize 'evalu'
 but not 'eval'?
 
 Without seeing the tokenizers you're using for the field type it's hard to
 say. You can use Solr's analysis page to see the tokens that are generated
 by the tokenizers in your analysis chain at both query time and index time.
 
 http://localhost:8983/solr/admin/analysis.jsp
 
 how do I get 'eval' to be a match?
 
 You could use synonyms to map 'eval' to 'evaluation'.
 
 Dan
 
 On Tue, May 1, 2012 at 8:17 PM, kfdroid kfdr...@gmail.com wrote:
 
 I have a field that is defined using what I believe is fairly standard
 text
 fieldType. I have documents with the words 'evaluate', 'evaluating',
 'evaluation' in them. When I search on the whole word, obviously it works,
 if I search on 'eval' it finds nothing. However for some reason if I search
 on 'evalu' it finds all the matches.  Is that an indexing setting or query
 setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval'
 to
 be a match?
 
 Thanks,
 Ken
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

--
Walter Underwood
wun...@wunderwood.org

Re: NPE when faceting

2012-05-01 Thread Jamie Johnson

it may be related this this

http://stackoverflow.com/questions/10124055/solr-faceted-search-throws-nullpointerexception-with-http-500-status

we are doing deletes from our index as well so it is possible that
we're running into the same issue.  I hope that sheds more light on
things.

On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote:
 I had reported this issue a while back, hoping that it was something
 with my environment, but that doesn't seem to be the case.  I am
 getting the following stack trace on certain facet queries.
 Previously when I did an optimize the error went away, does anyone
 have any insight into why specifically this could be happening?

 May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
        at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
        at 
 org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
        at 
 org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
        at 
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
        at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
        at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
        at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
        at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
        at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
        at org.eclipse.jetty.server.Server.handle(Server.java:351)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
        at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
        at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
        at java.lang.Thread.run(Thread.java:662)

Re: NPE when faceting

2012-05-01 Thread Yonik Seeley

Darn... looks likely that it's another bug from when part of
UnInvertedField was refactored into Lucene.
We really need some random tests that can catch bugs like these though
- I'll see if I can reproduce.

Can you open a JIRA issue for this?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote:
 I had reported this issue a while back, hoping that it was something
 with my environment, but that doesn't seem to be the case.  I am
 getting the following stack trace on certain facet queries.
 Previously when I did an optimize the error went away, does anyone
 have any insight into why specifically this could be happening?

 May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
        at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
        at 
 org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
        at 
 org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
        at 
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
        at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
        at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
        at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
        at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
        at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
        at org.eclipse.jetty.server.Server.handle(Server.java:351)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
        at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
        at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
        at java.lang.Thread.run(Thread.java:662)

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote:

 Do you mean besides query elevation?

 http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent

 And besides explicit boosting by the user (the ^ suffix operator after a
 term/phrase)?

 -- Jack Krupansky

 -Original Message- From: Donald Organ
 Sent: Tuesday, May 01, 2012 3:59 PM
 To: solr-user
 Subject: Boosting documents based on search term/phrase

 Is there a way to boost documents based on the search term/phrase?

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jeevanandam


Yes, you can add in last-components section on default query handler.

arr name=last-components
 strelevator/str
/arr

- Jeevanandam


On 02-05-2012 3:53 am, Donald Organ wrote:

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky
j...@basetechnology.comwrote:


Do you mean besides query elevation?


http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent

And besides explicit boosting by the user (the ^ suffix operator 
after a

term/phrase)?

-- Jack Krupansky

-Original Message- From: Donald Organ
Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jack Krupansky


Here's some doc from Lucid:
http://lucidworks.lucidimagination.com/display/solr/The+Query+Elevation+Component

-- Jack Krupansky

-Original Message- 
From: Donald Organ

Sent: Tuesday, May 01, 2012 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting documents based on search term/phrase

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Do you mean besides query elevation?

http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent

And besides explicit boosting by the user (the ^ suffix operator after a
term/phrase)?

-- Jack Krupansky

-Original Message- From: Donald Organ
Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: Removing old documents

2012-05-01 Thread Lance Norskog

Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,  alx...@aim.com wrote:
 Hello,

 I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

 without and with -noCommit  and restarted solr server

 Log  shows that 5 documents were removed but they are still in the search 
 results.
 Is this a bug or something is missing?
 I use nutch-1.4 and solr 3.5

 Thanks.
 Alex.







 -Original Message-
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, May 1, 2012 7:41 am
 Subject: Re: Removing old documents


 Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
 index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
 in one run based on segment data.

 On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.

 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 

 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
 
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

 --
 Markus Jelsma - CTO - Openindex





-- 
Lance Norskog
goks...@gmail.com

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht

I've been surprised to see Firefox cache even after empty-cache was ordered for 
JSOn results...
this is quite annoying but I have get accustomed to it by doing the following 
when I need to debug: add a random parameter extra. But only when debugging!

Using wget or curl showed me that the browser (and not solr-caching) was guilty 
of caching.
I think the If-Modified-Since might be guilt, it would be still sent even 
after empty cache...

paul



Le 1 mai 2012 à 23:57, Lance Norskog a écrit :

 Maybe this is the HTTP caching feature? Solr comes with HTTP caching
 turned on by default and so when you do queries and changes your
 browser does not fetch your changed documents.
 
 On Tue, May 1, 2012 at 11:53 AM,  alx...@aim.com wrote:
 Hello,
 
 I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
 
 without and with -noCommit  and restarted solr server
 
 Log  shows that 5 documents were removed but they are still in the search 
 results.
 Is this a bug or something is missing?
 I use nutch-1.4 and solr 3.5
 
 Thanks.
 Alex.
 
 
 
 
 
 
 
 -Original Message-
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, May 1, 2012 7:41 am
 Subject: Re: Removing old documents
 
 
 Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
 index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
 in one run based on segment data.
 
 On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.
 
 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
 
 mav.p...@holidaylettings.co.uk wrote:
 Not sure if there is an automatic way but we do it via a delete query and
 where possible we update doc under same id to avoid deletes.
 
 On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
 What is the best method to remove old documents?  Things that no
 generate 404 errors, etc.
 
 Is there an automatic method or do I have to do it manually?
 
 THanks.
 
 --
 Markus Jelsma - CTO - Openindex
 
 
 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com

Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Ken Krugler

Hi list,

Does anybody know if the Suggester component is designed to work with shards?

I'm asking because the documentation implies that it should (since ...Suggester 
reuses much of the SpellCheckComponent infrastructure…, and the 
SpellCheckComponent is documented as supporting a distributed setup).

But when I make a request, I get an exception:

java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Looking at the QueryComponent.java:493 code, I see:

SolrDocumentList docs = 
(SolrDocumentList)srsp.getSolrResponse().getResponse().get(response);

// calculate global maxScore and numDocsFound
if (docs.getMaxScore() != null) {   This is line 493

So I'm assuming the docs variable is null, which would happen if there is no 
response element in the Solr response.

If I make a direct request to the request handler in one core (e.g. 
http://hostname:8080/solr/core0/select?qt=suggest-coreq=rad), the query works.

But I see that there's no element named response, unlike a regular query.

response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
  /lst
  lst name=spellcheck
lst name=suggestions
  lst name=rad
int name=numFound10/int
int name=startOffset0/int
int name=endOffset3/int
arr name=suggestion
  strradair/str
  strradar/str
/arr
  /lst
/lst
  /lst
/response

So I'm wondering if my configuration is just borked and this should work, or 
the fact that the Suggester doesn't return a response field means that it just 
doesn't work with shards.
Thanks,
-- Ken

http://about.me/kkrugler
+1 530-210-6378






--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Mahout  Solr

response codes from http update requests

2012-05-01 Thread Welty, Richard

should i be concerned with the http response codes from update requests?

i can't find documentation on what values come back from them anywhere
(although maybe i'm not looking hard enough.) are they just http standard
with 200 for success and 400/500 for failures?

thanks,
   richard

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Ken Krugler

I should have also included one more bit of information.

If I configure the top-level (sharding) request handler to use just the suggest 
component as such:

  requestHandler name=/suggest 
class=org.apache.solr.handler.component.SearchHandler
!-- default values for query parameters --
lst name=defaults
  str name=echoParamsexplicit/str
  str name=shards.qtsuggest-core/str
  str 
name=shardslocalhost:8080/solr/core0/,localhost:8080/solr/core1//str
/lst

arr name=components
  strsuggest/str
/arr
  /requestHandler

Then I don't get a NPE, but I also get a response with no results.

response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
  str name=qr/str
/lst
  /lst
/response

For completeness, here are the other pieces to the solrconfig.xml puzzle:

  requestHandler class=org.apache.solr.handler.component.SearchHandler 
name=suggest-core
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest-one/str
  str name=spellcheck.count10/str
/lst

arr name=components
  strsuggest/str
/arr
  /requestHandler
  
  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest-one/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
  str name=fieldname/str  !-- the indexed field to derive 
suggestions from --
  float name=threshold0.05/float
  str name=buildOnCommittrue/str
/lst
lst name=spellchecker
  str name=namesuggest-two/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
  str name=fieldcontent/str  !-- the indexed field to derive 
suggestions from --
  float name=threshold0.0/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

Thanks,

-- Ken

On May 1, 2012, at 3:48pm, Ken Krugler wrote:

 Hi list,
 
 Does anybody know if the Suggester component is designed to work with shards?
 
 I'm asking because the documentation implies that it should (since 
 ...Suggester reuses much of the SpellCheckComponent infrastructure…, and the 
 SpellCheckComponent is documented as supporting a distributed setup).
 
 But when I make a request, I get an exception:
 
 java.lang.NullPointerException
   at 
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493)
   at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
   at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
   at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Looking at the QueryComponent.java:493 code, I see:
 
SolrDocumentList docs = 
 (SolrDocumentList)srsp.getSolrResponse().getResponse().get(response);
 
//

Re: How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Koji Sekiguchi


(12/05/02 1:47), Shanmugavel SRD wrote:

Hi,
   Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?


I think lucene-ja.jar no longer exists in Internet and doesn't work with
Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know
AttributeSource).

Use lucene-gosen which is the descendant project of sen/lucene-ja instead.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: Removing old documents

2012-05-01 Thread alxsss


 

 all caching is disabled and I restarted jetty. The same results.

Thanks.
Alex.

 

-Original Message-
From: Lance Norskog goks...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 2:57 pm
Subject: Re: Removing old documents


Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,  alx...@aim.com wrote:
 Hello,

 I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

 without and with -noCommit  and restarted solr server

 Log  shows that 5 documents were removed but they are still in the search 
results.
 Is this a bug or something is missing?
 I use nutch-1.4 and solr 3.5

 Thanks.
 Alex.







 -Original Message-
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, May 1, 2012 7:41 am
 Subject: Re: Removing old documents


 Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
 index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
 in one run based on segment data.

 On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.

 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 

 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
 
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

 --
 Markus Jelsma - CTO - Openindex





-- 
Lance Norskog
goks...@gmail.com

Re: get latest 50 documents the fastest way

2012-05-01 Thread Li Li

you should reverse your sort algorithm. maybe you can override the tf
method of Similarity and return -1.0f * tf(). (I don't know whether
default collector allow score smaller than zero)
Or you can hack this by add a large number or write your own
collector, in its collect(int doc) method, you can do like this:
collect(int doc){
float score=scorer.score();
score*=-1.0f;

}
if you don't sort by relevant score, just set Sort

On Tue, May 1, 2012 at 10:38 PM, Yuval Dotan yuvaldo...@gmail.com wrote:
 Hi Guys
 We have a use case where we need to get the 50 *latest *documents that
 match my query - without additional ranking,sorting,etc on the results.
 My index contains 1,000,000,000 documents and i noticed that if the number
 of found documents is very big (larger than 50% of the index size -
 500,000,000 docs) than it takes more than 5 seconds to get the results even
 with rows=50 parameter.
 Is there a way to get the results faster?
 Thanks
 Yuval

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Otis Gospodnetic

Hi,

Can you please give an example of what you mean?

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Donald Organ dor...@donaldorgan.com
To: solr-user solr-user@lucene.apache.org 
Sent: Tuesday, May 1, 2012 3:59 PM
Subject: Boosting documents based on search term/phrase
 
Is there a way to boost documents based on the search term/phrase?

Re: NPE when faceting

2012-05-01 Thread Jamie Johnson

I don't have any more details than I provided here, but I created a
ticket with this information.  Thanks again

https://issues.apache.org/jira/browse/SOLR-3427

On Tue, May 1, 2012 at 5:20 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 Darn... looks likely that it's another bug from when part of
 UnInvertedField was refactored into Lucene.
 We really need some random tests that can catch bugs like these though
 - I'll see if I can reproduce.

 Can you open a JIRA issue for this?

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10


 On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote:
 I had reported this issue a while back, hoping that it was something
 with my environment, but that doesn't seem to be the case.  I am
 getting the following stack trace on certain facet queries.
 Previously when I did an optimize the error went away, does anyone
 have any insight into why specifically this could be happening?

 May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
        at 
 org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
        at 
 org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
        at 
 org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
        at 
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
        at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
        at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
        at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
        at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
        at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
        at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
        at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
        at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
        at org.eclipse.jetty.server.Server.handle(Server.java:351)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
        at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
        at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
        at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
        at java.lang.Thread.run(Thread.java:662)

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

Perfect, this is working well.

On Tue, May 1, 2012 at 5:33 PM, Jeevanandam je...@myjeeva.com wrote:

 Yes, you can add in last-components section on default query handler.

 arr name=last-components
 strelevator/str
 /arr

 - Jeevanandam


 On 02-05-2012 3:53 am, Donald Organ wrote:

 query elevation was exactly what I was talking about.

 Now is there a way to add this to the default query handler?

 On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky
 j...@basetechnology.com**wrote:

  Do you mean besides query elevation?


 http://wiki.apache.org/solr/QueryElevationComponenthttp://wiki.apache.org/solr/**QueryElevationComponent
 http:/**/wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent
 

 And besides explicit boosting by the user (the ^ suffix operator after
 a
 term/phrase)?

 -- Jack Krupansky

 -Original Message- From: Donald Organ
 Sent: Tuesday, May 01, 2012 3:59 PM
 To: solr-user
 Subject: Boosting documents based on search term/phrase

 Is there a way to boost documents based on the search term/phrase?

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Ryan McKinley

check a release since r1332752

If things still look problematic, post a comment on:
https://issues.apache.org/jira/browse/SOLR-3426

this should now have a less verbose message with an older SLF4j and with Log4j


On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote:
 I have similar issue using log4j for logging with trunk build, the
 CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
 using sjfj 1.5.2

 10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
 java.lang.NoSuchMethodError:
 org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)



 On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote:

 On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com
 wrote:
  There is a recent JIRA issue about keeping the last n logs to display in
 the admin UI.
 
  That introduced a problem - and then the fix introduced a problem - and
 then the fix mitigated the problem but left that ugly logging as a by
 product.
 
  Don't remember the issue # offhand. I think there was a dispute about
 what should be done with it.
 
  On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
 
  CoreContainer.java, in the method 'load', finds itself calling
  loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
  'Log4j'.

 Couldn't someone just fix the if statement to say, 'OK, if we're doing
 log4j, we have no log watcher' and skip all the loud failing on the
 way?



 
  e.g.:
 
  2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
  to load LogWatcher
  org.apache.solr.common.SolrException: Error loading class 'Log4j'
 
  What is it actually looking for? Have I misplaced something?
 
  - Mark Miller
  lucidimagination.com

Re: Ampersand issue

2012-05-01 Thread Ryan McKinley

If your json value is amp; the proper xml value is amp;amp;

What is the value you are setting on the stored field?  is is  or amp;?


On Mon, Apr 30, 2012 at 12:57 PM, William Bell billnb...@gmail.com wrote:
 One idea was to wrap the field with CDATA. Or base64 encode it.



 On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell billnb...@gmail.com wrote:
 We are indexing a simple XML field from SQL Server into Solr as a stored 
 field. We have noticed that the amp; is outputed as amp;amp; when using 
 wt=XML. When using wt=JSON we get the normal amp;. If there a way to 
 indicate that we don't want to encode the field since it is already XML when 
 using wt=XML ?

 Bill Bell
 Sent from mobile




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

Yes, I'm the author of that JIRA.

On Tue, May 1, 2012 at 8:45 PM, Ryan McKinley ryan...@gmail.com wrote:
 check a release since r1332752

 If things still look problematic, post a comment on:
 https://issues.apache.org/jira/browse/SOLR-3426

 this should now have a less verbose message with an older SLF4j and with Log4j


 On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote:
 I have similar issue using log4j for logging with trunk build, the
 CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
 using sjfj 1.5.2

 10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
 java.lang.NoSuchMethodError:
 org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)



 On Tue, May 1, 2012 at 9:25 AM, Benson Margulies 
 bimargul...@gmail.comwrote:

 On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com
 wrote:
  There is a recent JIRA issue about keeping the last n logs to display in
 the admin UI.
 
  That introduced a problem - and then the fix introduced a problem - and
 then the fix mitigated the problem but left that ugly logging as a by
 product.
 
  Don't remember the issue # offhand. I think there was a dispute about
 what should be done with it.
 
  On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
 
  CoreContainer.java, in the method 'load', finds itself calling
  loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
  'Log4j'.

 Couldn't someone just fix the if statement to say, 'OK, if we're doing
 log4j, we have no log watcher' and skip all the loud failing on the
 way?



 
  e.g.:
 
  2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
  to load LogWatcher
  org.apache.solr.common.SolrException: Error loading class 'Log4j'
 
  What is it actually looking for? Have I misplaced something?
 
  - Mark Miller
  lucidimagination.com

Looking for a way to separate MySQL query from DIH data-config.xml

2012-05-01 Thread Peter Boudreau

Hello everyone,

I have a working DIH setup with a couple of long and complicated MySQL queries 
in data-config.xml. To make it easier/safer for myself and other developers in 
my company to edit the MySQL query, I’d like to remove it from data-config.xml 
and store it in a separate file, and then call to that from data-config.xml.

Is there anyone who’s currently doing this right now and could share what 
method was used to accomplish this?

At some point on this list I saw someone mention that they had done just what 
I’m trying to do by putting the query in a separate SQL file as a MySQL stored 
procedure, and then calling that procedure from the query=”” portion of 
data-config.xml, but I don’t quite understand how/at what point that SQL file 
with the stored procedure would be read by DIH.

Does anyone know how this would be done, or have any other suggestions for how 
to move the query into a separate document?

Thanks in advance,
Peter

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Robert Muir

On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com wrote:
 Hi list,

 Does anybody know if the Suggester component is designed to work with shards?


I'm not really sure it is? They would probably have to override the
default merge implementation specified by SpellChecker.

But, all of the current suggesters pump out over 100,000 QPS on my
machine, so I'm wondering what the usefulness of this is?

And if it was useful, merging results from different machines is
pretty inefficient, for suggest you would shard by term instead so
that you need only contact a single host?


-- 
lucidimagination.com

74 matches

Mail list logo