facet total score instead of total count

2010-07-28 Thread Bharat Jain
Hi,
   I have a requirement where I want to sum up the scores of the faceted
fields. This will be decide the relevancy for us. Is there a way to do it on
a facet field? Basically instead of giving the count of records for facet
field I would like to have total sum of scores for those records.

Any help is greatly appreciated.

Thanks
Bharat Jain


logic required for newbie

2010-07-28 Thread Jonty Rhods
Hi All,

I am very new and learning solr.

I have 10 column like following in table

1. id
2. name
3. user_id
4. location
5. country
6. landmark1
7. landmark2
8. landmark3
9. landmark4
10. landmark5

when user search for landmark then  I want to return only one landmark which
match. Rest of the landmark should ingnored..
expected result like following if user search by landmark2..

1. id
2. name
3. user_id
4. location
5. country
7. landmark2

or if search by landmark9

1. id
2. name
3. user_id
4. location
5. country
9. landmark9


please help me to design the schema for this kind of requirement...

thanks
with regards


Re: question about relevance

2010-07-28 Thread Bharat Jain
Well you are correct Erik that this is a database-ish thing try to achieve
in solr and unfortunately the sin :) had been committed by somebody else :)
and now we are running into relevancy issues.

Let me try to state the problem more casually.

1. There are user records of type A, B, C etc. (userId field in index is
common to all records)
2. A user can have any number of A, B, C etc (e.g. think of A being a
language then user can know many languages like french, english, german etc)
3. Records are currently stored as a document in index.
4. A given query can match multiple records for the user
5. If for a user more records are matched (e.g. if he knows both french and
german) then he is more relevant and should come top in UI. This is the
reason I wanted to add lucene scores assuming the greater score means more
relevance.

Hope you got what I was saying.

Another idea for this situation is doing faceting on userId field and then
add the score but currently I think lucene only support facet count,
basically solr will give you only count of docs it matched. Can I get sum of
the score of documents that matched?


Thanks
Bharat Jain


On Tue, Jul 27, 2010 at 5:58 AM, Erick Erickson erickerick...@gmail.comwrote:

 I'm having trouble getting my head around what you're trying to accomplish,
 so if this is off base you know why G.

 But what it smells like is that you're trying to do database-ish things in
 a SOLR index, which is almost always the wrong approach. Is there a
 way to index redundant data with each document so all you have to do
 to get the relevant users is a simple query?

 Adding scores is also suspect.. I don't see how that does predictable
 things.

 But I'm also failing completely to understand what a relevant user is.

 not much help, if this is way off base perhaps you could provide some
 additional use-cases?

 Best
 Erick

 On Mon, Jul 26, 2010 at 2:37 AM, Bharat Jain bharat.j...@gmail.com
 wrote:

  Hello All,
 
  I have a index which store multiple objects belonging to a user
 
  for e.g.
  schema
   field name=objType type=... / - Identifies user
  object type e.g. userBasic or userAdv
 
   !-- obj 1 --
   field name=first_name type=... /   MAPS to
 userBasicInfoObject
   field name=last_name type=... /
 
   !-- obj 2 --
   field name=user_data_1 type=... / - MAPS to userAdvInfoObject
   field name=user_data_2 type=... /
 
  /schema
 
 
  Now when I am doing some query I get multiple records mapping to java
  objects (identified by objType) that belong to the same user.
 
 
  Now I want to show the relevant users at the top of the list. I am
 thinking
  of adding the Lucene scores of different result documents to get the best
  scores. Is this correct approach to get the relevance of the user?
 
  Thanks
  Bharat Jain
 



Re: Any tips/guidelines to turning the Solr/luence performance in a master/slave/sharding environment

2010-07-28 Thread Tommaso Teofili
Hi,
I think the starting point should be :
http://wiki.apache.org/solr/SolrPerformanceFactors
For example you could start playing with the mergeFactor parameter.
My 2 cents,
Tommaso

2010/7/27 Chengyang atreey...@163.com

 How to reduce the index files size, decreate the sync time between each
 nodes. decrease the index create/update time.
 Thanks.




Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault dthiba...@esperion.com

 Alessandro  all,

 I was having the same issue with Tika crashing on certain PDFs.  I also
 noticed the bug where no content was extracted after upgrading Tika.

 When I went to the SOLR issue you link to below, I applied all the patches,
 downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
 got the following error:
 SEVERE: java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
 at
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
 at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
 at java.lang.Thread.run(Thread.java:619)

 This is really weird because I DID apply the SolrResourceLoader patch that
 adds the getClassLoader method.  I even verified by going opening up the
 JARs and looking at the class file in Eclipse...I can see the
 SolrResourceLoader.getClassLoader() method.

 Does anyone know why it can't find the method?  After patching the source I
 did ant clean dist in the base directory of the Solr source tree and
 everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
 the jars from dist/ and all the library dependencies from
 contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
 the logs looked good.

 I'm stumped.  It would be very nice to have a Solr implementation using the
 newest versions of PDFBox  Tika and actually have content being
 extracted...=)

 Best,
 Dave


 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Tuesday, July 27, 2010 6:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 Hi Jon,
 During the last days we front the same problem.
 Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
 content and from others, Solr throws an exception during the Indexing
 Process .
 You must:
 Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
 snapshot and tika-parsers 0.8.
 Update PdfBox and all related libraries.
 After that You have to patch Solr 1.4.1 following this patch :

 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
 This is the firts way to solve the problem.

 Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
 is
 thrown during the Indexing process, but no content is extracted.
 Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
 sounds good but we don't know how stableit is!
 I hope you have now a clear  vision of this issue,
 Best Regards



 2010/7/26 Sharp, Jonathan jsh...@coh.org

 
  Every so often I need to index new batches of scanned PDFs and
 occasionally
  Adobe's OCR can't recognize the text in a couple of these documents. In
  these situations I would like to type in a small amount of text onto the
  document and have it be extracted by Solr CELL.
 
  Adobe Pro 9 has a number of different ways to add text directly to a PDF
  file:
 
  *Typewriter
  *Sticky Note
  *Callout boxes
  *Text boxes
 
  I tried indexing documents with each of these text additions 

Re: SpatialSearch: sorting by distance

2010-07-28 Thread Pavel Minchenkov
Does anybody know if this feature works correctly?
Or I'm doing something wrong?

2010/7/27 Pavel Minchenkov char...@gmail.com

 Hi,

 I'm trying to sort by distance like this:

 sort=dist(2,lat,lon,55.755786,37.617633) asc

 In general results are sorted, but some documents are not in right order.
 I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate
 real distance after reading documents from Solr.

 Solr version from trunk.

 fieldType name=double class=solr.TrieDoubleField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 field name=lat type=double indexed=true stored=true/
 field name=lon type=double indexed=true stored=true/

 Thanks.

 --
 Pavel Minchenkov




-- 
Pavel Minchenkov


Re: Integration Problem

2010-07-28 Thread Jörg Wißmeier
Nobody out there who can help me with this problem?

I need to edit the result of the javabin writer (adding the results from
the webservice).
I hope it is possible to do that.

thanks in advance.

Am Mo 26.07.2010 10:25 schrieb Jörg Wißmeier :

Hi everybody,

since a while i'm working with solr and i have integrated it with
liferay 6.0.3. So every search request from liferay is processed by
solr
and its index.
But i have to integrate another system, this system offers me a
webservice. the results of these webservice should be in the results of
solr but not in the index of it.
I tried to do that with a custom query handler and a custom response
writer and i'm able to write in the response msg of solr but only in
the
response node of the xml msg an not in the results node.
So is there any solution how i could write in the results node of the
xml msg from solr?

thanks in advance

best regards
joerg






Mit freundlichen Grüßen,


Jörg Wißmeier


___
Ancud IT-Beratung GmbH
Glockenhofstr. 47
90478 Nürnberg
Germany

T +49 911 25 25 68-0
F +49 911 25 25 68-68
joerg.wissme...@ancud.de
www.ancud.de

Angaben nach EHUG:
Ancud IT-Beratung GmbH, Nürnberg; Geschäftsführer Konstantin Böhm;
Amtsgericht Nürnberg, HRB 19954



solr log file rotation

2010-07-28 Thread Christos Constantinou
Hi all,

I am running a Solr 1.4 instance on FreeBSD that generates large log files in 
very short periods. I used /etc/newsyslog to configure log file rotation, 
however once the log file is rotated then Solr doesn't write logs to the new 
file. I'm wondering if there is a way to let Solr know that the log file will 
be rotated so that it recreates a correct file handle?

Thanks

Christos

Re: Spellchecking and frequency

2010-07-28 Thread dan sutton
Hi Mark,

Thanks for that info looks very interesting, would be great to see your
code. Out of interest did you use the dictionary and the phonetic file? Did
you see better results with both?

In regards to the secondary part to check the corpus for matching
suggestions, would another way to do this is to have an event listener to
listen for commits, and then build the dictionary for matching corpus words
that way, then you avoid the performance hit at query time.

Cheers,
Dan

On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland mark.holl...@zoopla.co.ukwrote:

 Hi,

 I found the suggestions returned from the standard solr spellcheck not to
 be
 that relevant. By contrast, aspell, given the same dictionary and mispelled
 words, gives much more accurate suggestions.

 I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
 the java aspell library. I also extended the SpellCheckComponent to take
 the
 matrix of suggested words and query the corpus to find the first
 combination
 of suggestions which returned a match. This works well for my use case,
 where term frequency is irrelevant to spelling or scoring.

 I'd like to publish the code in case someone finds it useful (although it's
 a bit crude at the moment and will need a decent tidy up). Would it be
 appropriate to open up a Jira issue for this?

 Cheers,
 ~mark

 On 27 July 2010 09:33, dan sutton danbsut...@gmail.com wrote:

  Hi,
 
  I've recently been looking into Spellchecking in solr, and was struck by
  how
  limited the usefulness of the tool was.
 
  Like most corpora , ours contains lots of different spelling mistakes for
  the same word, so the 'spellcheck.onlyMorePopular' is not really that
  useful
  unless you click on it numerous times.
 
  I was thinking that since most of the time people spell words correctly
 why
  was there no other frequency parameter that could enter into the score?
  i.e.
  something like:
 
  spell_score ~ edit_dist * freq
 
  I'm sure others have come across this issue and was wonding what
  steps/algorithms they have used to overcome these limitations?
 
  Cheers,
  Dan
 



Re: Indexing Problem: Where's my data?

2010-07-28 Thread Chantal Ackermann
make sure to set stored=true on every field you expect to be returned
in your results for later display.

Chantal




Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-28 Thread Chantal Ackermann
Hi Lance!

On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote:
 Should this go into the trunk, or does it only solve problems unique
 to your use case?

The solution is generic but is an extension of XPathEntityProcessor
because I didn't want to touch the solr.war. This way I can deploy the
extension into SOLR_HOME/lib.
The problem that it solves is not one with XPathEntityProcessor but more
general. What it does:

It adds an attribute to the entity that I called skipIfEmpty which
takes the variable (it could even take more variables seperated by
whitespace).
On entityProcessor.init() which is called for sub-entities per row of
root entity (:= before every new request to the data source), the value
of the attribute is resolved and if it is null or empty (after
trimming), the entity is not further processed.
This attribute is only allowed on sub-entities.

It would probably be nicer to put that somewhere higher up in the class
hierarchy so that all entity processors could make use of it.
But I don't know how common the use case is - all examples I found where
more or less joins on primary keys.

Cheers,
Chantal

Here comes the code==

import static
org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE;

import java.util.Map;
import java.util.logging.Logger;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImportHandlerException;
import org.apache.solr.handler.dataimport.XPathEntityProcessor;

public class OptionalXPathEntityProcessor extends XPathEntityProcessor {
private Logger log =
Logger.getLogger(OptionalXPathEntityProcessor.class.getName());
private static final String SKIP_IF_EMPTY = skipIfEmpty;
private boolean skip = false;

@Override
protected void firstInit(Context context) {
if (context.isRootEntity()) {
throw new DataImportHandlerException(SEVERE,
OptionalXPathEntityProcessor not allowed for root entities.);
}
super.firstInit(context);
}

@Override
public void init(Context context) {
String value = 
context.getResolvedEntityAttribute(SKIP_IF_EMPTY);
if (value == null || value.trim().isEmpty()) {
skip = true;
} else {
super.init(context);
skip = false;
}
}

@Override
public MapString, Object nextRow() {
if (skip) return null;
return super.nextRow();
}
}




Solr using 1500 threads - is that normal?

2010-07-28 Thread Christos Constantinou
Hi,

Solr seems to be crashing after a JVM exception that new threads cannot be 
created. I am writing in hope of advice from someone that has experienced this 
before. The exception that is causing the problem is:

Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create 
new native thread

The memory that is allocated to Solr is 3072MB, which should be enough memory 
for a ~6GB data set. The documents are not big either, they have around 10 
fields of which only one stores large text ranging between 1k-50k.

The top command at the time of the crash shows Solr using around 1500 threads, 
which I assume it is not normal. Could it be that the threads are crashing one 
by one and new ones are created to cope with the queries?

In the log file, right after the the exception, there are several thousand 
commits before the server stalls completely. Normally, the log file would 
report 20-30 document existence queries per second, then 1 commit per 5-30 
seconds, and some more infrequent faceted document searches on the data. 
However after the exception, there are only commits until the end of the log 
file.

I am wondering if anyone has experienced this before or if it is some sort of 
known bug from Solr 1.4? Is there a way to increase the details of the 
exception in the logfile?

I am attaching the output of a grep Exception command on the logfile.

Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM 

Re: Strange search

2010-07-28 Thread stockii

try to delete solr.SnowballPorterFilterFactory from your analyzerchain. i
had similar problems by using german  SnowballPorterFilterFactory
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1001990.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrJ Response + JSON

2010-07-28 Thread MitchK

Hello community,

I need to transform SolrJ - responses into JSON, after some computing on
those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.


Get unique values

2010-07-28 Thread Rafal Bluszcz Zawadzki
Hi,

In my schema I have (inter ali) fields CollectionID, and CollectionName.
 These two values always match together, which means that for every value of
CollectionID there is matching value from CollectionName.

I am interested in query which allow me to get unique values of CollectionID
with matching CollectionNames (rest of fields is not interested for me in
this query).

I was thinking about facets, but they offer a bit more than I need.

Anyone has idea for query which allow me to get these results?

Cheers,

-- 
Rafał Zawadzki
http://dev.bluszcz.net


Highlighted match snippets highlight non-matched words (such as 0.1 and 0.2)

2010-07-28 Thread Jon Cram
Hi,

 

I'm observing some strange highlighted words in field value snippets
returned from Solr when matched term highlighting
(http://wiki.apache.org/solr/HighlightingParameters) is enabled.

 

In some cases, highlighted field value snippets contain highlighted
words that are not matches:

-  this appears to be in addition to highlighting words that are
matches

-  these non-match highlighted words are not pre-highlighted in
the indexed content

-  I've determined these are non-matches by appending
debugQuery=1 to the URL and examining the match detail information

 

I've so far observed this in relation to the strings 0, 0.1, 0.2
and 0.4 in indexed content.

 

Real life example when searching for [gas]:

 

Relevant matched document result from Solr:

doc

str name=description

EXAMPLE prepares an extensive range of traceable calibration gas
standards with guaranteed relative uncertainties levels of 0.1% for
certain species (PDF 676 KB).

/str

/doc

 

Related highlighted snippet:

lst name=7232

arr name=description

str

EXAMPLE prepares an extensive range of traceable calibration
emgas/em standards with guaranteed relative uncertainties levels of
em0.1/em% for certain species (PDF 676 KB).

/str

/arr

/lst

 

Note how the highlight snippet correctly highlights gas and
incorrectly highlights 0.1. I've observed similar results for other
searches where indexed content contains 0, 0.1, 0.2 and 0.4 and
where these numbers are highlighted incorrectly.

 

At this stage I'm trying to determine if this due to a poor
implementation on my behalf or whether this is a bug in Solr.

 

I'd really like to know if:

 

1.   Anyone else has observed this behaviour

2.   If this might be a known issue with Solr (I've tried to find
out but haven't had any luck)

3.   Anyone can test using something like
http://solr/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in+resp
onse)hl.fragsize=0
http://%3csolr%3e/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in
+response)hl.fragsize=0 

 

Thanks,

Jon Cram

 



Re: clustering component

2010-07-28 Thread Stanislaw Osinski
 The patch should also work with trunk, but I haven't verified it yet.


I've just added a patch against solr trunk to
https://issues.apache.org/jira/browse/SOLR-1804.

S.


Show elevated Result Differently

2010-07-28 Thread Vishal.Arora

I want to show elevated Result Different from others is there any way to do
this 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan
I think you should just be able to add wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



SolrJ Response + JSON

2010-07-28 Thread MitchK

Hello , 

Second try to send a mail to the mailing list... 

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you. 
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002115p1002115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan


On 28 Jul 2010, at 2:08 pm, MitchK wrote:

Second try to send a mail to the mailing list...


Your first attempt got through as well.  Here's my original response.


I think you should just be able to add wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: SolrJ Response + JSON

2010-07-28 Thread Markus Jelsma
Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the 
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get 
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
 Hello ,
 
 Second try to send a mail to the mailing list...
 
 I need to translate SolrJ's response into JSON-response.
 I can not query Solr directly, because I need to do some math with the
 responsed data, before I show the results to the client.
 
 Any experiences how to translate SolrJ's response into JSON without writing
 your own JSON Writer?
 
 Thank you.
 - Mitch
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Thank you Markus, Mark.

Seems to be a problem with Nabble, not with the mailing list. Sorry.

I can create a JSON-response, when I query Solr directly.
But I mean, that I query Solr through a SolrJ-client 
(CommonsHttpSolrServer).
That means my queries look a litte bit like that: 
http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr

So the response is shown as an QueryResponse-object, not as a JSON-string.

Or do I miss something here?

Am 28.07.2010 15:15, schrieb Markus Jelsma:

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


   




RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Yesterday I did get this working with version 4.0 from trunk.  I haven't fully 
tested it yet, but the content doesn't come through blank anymore, so that's 
good.  Would it be more stable to stick with 1.4.1 and your patch to get to 
Tika 0.8, or to stick with the 4.0 trunk version?

Best,
Dave

-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault dthiba...@esperion.com

 Alessandro  all,

 I was having the same issue with Tika crashing on certain PDFs.  I also
 noticed the bug where no content was extracted after upgrading Tika.

 When I went to the SOLR issue you link to below, I applied all the patches,
 downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
 got the following error:
 SEVERE: java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
 at
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
 at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
 at java.lang.Thread.run(Thread.java:619)

 This is really weird because I DID apply the SolrResourceLoader patch that
 adds the getClassLoader method.  I even verified by going opening up the
 JARs and looking at the class file in Eclipse...I can see the
 SolrResourceLoader.getClassLoader() method.

 Does anyone know why it can't find the method?  After patching the source I
 did ant clean dist in the base directory of the Solr source tree and
 everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
 the jars from dist/ and all the library dependencies from
 contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
 the logs looked good.

 I'm stumped.  It would be very nice to have a Solr implementation using the
 newest versions of PDFBox  Tika and actually have content being
 extracted...=)

 Best,
 Dave


 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Tuesday, July 27, 2010 6:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 Hi Jon,
 During the last days we front the same problem.
 Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
 content and from others, Solr throws an exception during the Indexing
 Process .
 You must:
 Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
 snapshot and tika-parsers 0.8.
 Update PdfBox and all related libraries.
 After that You have to patch Solr 1.4.1 following this patch :

 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
 This is the firts way to solve the problem.

 Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
 is
 thrown during the Indexing process, but no content is extracted.
 Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
 sounds good but we don't know how stableit is!
 I hope you have now a clear  vision of this issue,
 

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Alessandro Benedetti
In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault dthiba...@esperion.com

 Yesterday I did get this working with version 4.0 from trunk.  I haven't
 fully tested it yet, but the content doesn't come through blank anymore, so
 that's good.  Would it be more stable to stick with 1.4.1 and your patch to
 get to Tika 0.8, or to stick with the 4.0 trunk version?

 Best,
 Dave

 -Original Message-
 From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 Sent: Wednesday, July 28, 2010 3:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 I attached a patch for Solr 1.4.1 release on
 https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
 me.
 This strange behaviour for me was due to the fact that I copied the patched
 jars and war inside the dist directory but forgot to update the war inside
 the example/webapps directory (that is inside Jetty).
 Hope this helps.
 Tommaso

 2010/7/27 David Thibault dthiba...@esperion.com

  Alessandro  all,
 
  I was having the same issue with Tika crashing on certain PDFs.  I also
  noticed the bug where no content was extracted after upgrading Tika.
 
  When I went to the SOLR issue you link to below, I applied all the
 patches,
  downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
 and
  got the following error:
  SEVERE: java.lang.NoSuchMethodError:
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
  at
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
  at
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
  at
 org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
  at java.lang.Thread.run(Thread.java:619)
 
  This is really weird because I DID apply the SolrResourceLoader patch
 that
  adds the getClassLoader method.  I even verified by going opening up the
  JARs and looking at the class file in Eclipse...I can see the
  SolrResourceLoader.getClassLoader() method.
 
  Does anyone know why it can't find the method?  After patching the source
 I
  did ant clean dist in the base directory of the Solr source tree and
  everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
  the jars from dist/ and all the library dependencies from
  contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
 in
  the logs looked good.
 
  I'm stumped.  It would be very nice to have a Solr implementation using
 the
  newest versions of PDFBox  Tika and actually have content being
  extracted...=)
 
  Best,
  Dave
 
 
  -Original Message-
  From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
  Sent: Tuesday, July 27, 2010 6:09 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  Hi Jon,
  During the last days we front the same problem.
  Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
  content and from others, Solr throws an exception during the Indexing
  Process .
  You must:
  Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
  snapshot and tika-parsers 0.8.
  Update PdfBox and all related libraries.
  After that You have to patch Solr 1.4.1 following this patch :
 
 
 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
  This is the firts way to solve the problem.
 
  Using Solr 1.4.1 (with tika 0.8 snapshot and 

Re: SolrJ Response + JSON

2010-07-28 Thread Chantal Ackermann
You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
 Hello , 
 
 Second try to send a mail to the mailing list... 
 
 I need to translate SolrJ's response into JSON-response.
 I can not query Solr directly, because I need to do some math with the
 responsed data, before I show the results to the client.
 
 Any experiences how to translate SolrJ's response into JSON without writing
 your own JSON Writer?
 
 Thank you. 
 - Mitch




RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-28 Thread David Thibault
If you don't store the content then you can't do highlighting, right?  Also, 
don't you just have to switch the text field to say stored=true in your 
schema to store the text?  I don't understand why you're differentiating the 
behavior of ExtractingRequestHandler from the behavior of Solr in general.  
Doesn't ExtractingRequestHandler just pull the text out of whatever file you 
send it and then the rest of the processing happens like any other Solr post?

The bug I was experiencing was the same one that someone else brought up on the 
list yesterday in the emails entitled Extracting PDF 
text/comment/callout/typewriter boxes with Solr   CELL/Tika/PDFBox.  It ties 
back to this bug:
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel

I saw that email shortly after I sent this one to the list (it figures, doesn't 
it...=).

I tried doing what they suggested on that bug report (patching Solr 1.4.x and 
using Tika 0.8-SNAPSHOT), but the patches failed when I applied it to my Solr 
1.4.1.  They have since added a patch for Solr 1.4.1.  I haven't tried it yet.  
However, I did get it working using Solr 4.0 out of trunk (which also uses Tika 
0.8 and updated PDFBox jars).  I have yet to decide which will be more stable, 
Solr 4.0 or patched Solr 1.4.1, both of which with updated PDFbox and Tika jars.

Best,
Dave

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Tuesday, July 27, 2010 8:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).

It should be indexed- that means if you search 'text' with a word from
the document, it will find those documents and bring back the file
name. Your app has to then use the file name.  Solr/Lucene is not
intended as a general-purpose content store, only an index.

The ERH wiki page doesn't quite say this. It describes what the ERH
does rather than what it does not do :)

On Mon, Jul 26, 2010 at 12:00 PM, David Thibault dthiba...@esperion.com wrote:
 Hello all,

 I’m working on a project with Solr.  I had 1.4.1 working OK using 
 ExtractingRequestHandler except that it was crashing on some PDFs.  I noticed 
 that Tika bundled with 1.4.1 was 0.4, which was kind of old.  I decided to 
 try updating to 0.7 as per the directions here: 
 http://wiki.apache.org/solr/ExtractingRequestHandler  but it was giving me 
 errors (I forget what they were specifically).

 Then I tried downloading Solr 3.1 from the source repository, which I noticed 
 came with Tika 0.7.  I figured this would be an easier route to get working.  
 Now I’m testing with 3.1 and 0.7 and I’m noticing my documents are going into 
 Solr OK, but they all have blank content (no document text stored in Solr).  
 I did see that the default “text” field is not stored. Changing that to 
 stored=true didn’t help.  Changing to 
 fmap.content=attr_contentuprefix=attr_content didn’t help either.  I have 
 attached all relevant info here.  Please let me know if someone sees 
 something I don’t (it’s entirely possible as I’m relatively new to Solr).

 Schema.xml:
 ?xml version=1.0 encoding=UTF-8 ?
 schema name=example version=1.3
  types
fieldType name=string class=solr.StrField sortMissingLast=true 
 omitNorms=true/
fieldType name=boolean class=solr.BoolField sortMissingLast=true 
 omitNorms=true/
fieldtype name=binary class=solr.BinaryField/
fieldType name=int class=solr.TrieIntField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=tint class=solr.TrieIntField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField omitNorms=true 
 precisionStep=0 positionIncrementGap=0/
fieldType name=tdate class=solr.TrieDateField omitNorms=true 
 precisionStep=6 positionIncrementGap=0/
fieldType name=pint class=solr.IntField omitNorms=true/
fieldType name=plong class=solr.LongField omitNorms=true/
fieldType name=pfloat class=solr.FloatField 

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Thanks, I'll try that then. I kind of figured that'd be the answer, but after 
fighting with Solr  ExtractingRequestHandler for 2 days I also just wanted to 
be done with it once it started working with 4.0...=)  However, stability would 
be better in the long run.

Best,
Dave

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Wednesday, July 28, 2010 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault dthiba...@esperion.com

 Yesterday I did get this working with version 4.0 from trunk.  I haven't
 fully tested it yet, but the content doesn't come through blank anymore, so
 that's good.  Would it be more stable to stick with 1.4.1 and your patch to
 get to Tika 0.8, or to stick with the 4.0 trunk version?

 Best,
 Dave

 -Original Message-
 From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 Sent: Wednesday, July 28, 2010 3:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 I attached a patch for Solr 1.4.1 release on
 https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
 me.
 This strange behaviour for me was due to the fact that I copied the patched
 jars and war inside the dist directory but forgot to update the war inside
 the example/webapps directory (that is inside Jetty).
 Hope this helps.
 Tommaso

 2010/7/27 David Thibault dthiba...@esperion.com

  Alessandro  all,
 
  I was having the same issue with Tika crashing on certain PDFs.  I also
  noticed the bug where no content was extracted after upgrading Tika.
 
  When I went to the SOLR issue you link to below, I applied all the
 patches,
  downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
 and
  got the following error:
  SEVERE: java.lang.NoSuchMethodError:
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
  at
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
  at
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
  at
 org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
  at java.lang.Thread.run(Thread.java:619)
 
  This is really weird because I DID apply the SolrResourceLoader patch
 that
  adds the getClassLoader method.  I even verified by going opening up the
  JARs and looking at the class file in Eclipse...I can see the
  SolrResourceLoader.getClassLoader() method.
 
  Does anyone know why it can't find the method?  After patching the source
 I
  did ant clean dist in the base directory of the Solr source tree and
  everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
  the jars from dist/ and all the library dependencies from
  contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
 in
  the logs looked good.
 
  I'm stumped.  It would be very nice to have a Solr implementation using
 the
  newest versions of PDFBox  Tika and actually have content being
  extracted...=)
 
  Best,
  Dave
 
 
  -Original Message-
  From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
  Sent: Tuesday, July 27, 2010 6:09 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  Hi Jon,
  During the last days we front the same problem.
  Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't 

Re: logic required for newbie

2010-07-28 Thread rajini maski
you can index each of these field separately...
field1- Id
field2- name
field3-user_id
field4-country.


field7- landmark

While quering  you can specify  q=Landmark9 This will return you results..
And if you want only particular fields in output.. use the fl parameter in
query...

like

http://localhost:8090/solr/select?
indent=onq=landmark9fl=ID,user_id,country,landmark

This will give your desired solution..




On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi All,

 I am very new and learning solr.

 I have 10 column like following in table

 1. id
 2. name
 3. user_id
 4. location
 5. country
 6. landmark1
 7. landmark2
 8. landmark3
 9. landmark4
 10. landmark5

 when user search for landmark then  I want to return only one landmark
 which
 match. Rest of the landmark should ingnored..
 expected result like following if user search by landmark2..

 1. id
 2. name
 3. user_id
 4. location
 5. country
 7. landmark2

 or if search by landmark9

 1. id
 2. name
 3. user_id
 4. location
 5. country
 9. landmark9


 please help me to design the schema for this kind of requirement...

 thanks
 with regards



Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
This was my same feeling :-) and so I went for the trunk to have things
working quickly, but I also have to consider which one is the best version
since I am going to deploy it in the near future in an enterprise
environment and choosing the best version is an importat step.
I am quite new to Solr but I agree with Alessandro that probably using a
slightly patched release should theoretically be more stable than the trunk
which get many updates weekly (and daily).
Cheers,
Tommaso

2010/7/28 David Thibault dthiba...@esperion.com

 Thanks, I'll try that then. I kind of figured that'd be the answer, but
 after fighting with Solr  ExtractingRequestHandler for 2 days I also just
 wanted to be done with it once it started working with 4.0...=)  However,
 stability would be better in the long run.

 Best,
 Dave

 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Wednesday, July 28, 2010 9:33 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 In my opinion, the 1.4.1 version with the Patch is more Stable.
 Until 4.0 will be released 

 2010/7/28 David Thibault dthiba...@esperion.com

  Yesterday I did get this working with version 4.0 from trunk.  I haven't
  fully tested it yet, but the content doesn't come through blank anymore,
 so
  that's good.  Would it be more stable to stick with 1.4.1 and your patch
 to
  get to Tika 0.8, or to stick with the 4.0 trunk version?
 
  Best,
  Dave
 
  -Original Message-
  From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
  Sent: Wednesday, July 28, 2010 3:31 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  I attached a patch for Solr 1.4.1 release on
  https://issues.apache.org/jira/browse/SOLR-1902 that made things work
 for
  me.
  This strange behaviour for me was due to the fact that I copied the
 patched
  jars and war inside the dist directory but forgot to update the war
 inside
  the example/webapps directory (that is inside Jetty).
  Hope this helps.
  Tommaso
 
  2010/7/27 David Thibault dthiba...@esperion.com
 
   Alessandro  all,
  
   I was having the same issue with Tika crashing on certain PDFs.  I also
   noticed the bug where no content was extracted after upgrading Tika.
  
   When I went to the SOLR issue you link to below, I applied all the
  patches,
   downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
  and
   got the following error:
   SEVERE: java.lang.NoSuchMethodError:
  
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
   at
  
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
   at
  
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
   at
  
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at
  
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at
  
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
  
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   at
  
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
   at
  
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
   at
  org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
   at java.lang.Thread.run(Thread.java:619)
  
   This is really weird because I DID apply the SolrResourceLoader patch
  that
   adds the getClassLoader method.  I even verified by going opening up
 the
   JARs and looking at the class file in Eclipse...I can see the
   SolrResourceLoader.getClassLoader() method.
  
   Does anyone know why it can't find the method?  After patching the
 source
  I
   did ant clean dist in the base directory of the Solr source tree and
   everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied
 all
   the jars from dist/ and all the library dependencies from
   

simple question from a newbie

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)
Hi,

 

I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example

 

I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

 

I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

I want something like dc.title=c* type query

 

I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]

 

 

Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 

 



Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field 
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration. 
However, the client shouldn't do that.

How did you solved that problem?

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch
 



   




display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui


Re: SolrJ Response + JSON

2010-07-28 Thread Chantal Ackermann
Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
 Thank you, Chantal.
 
 I have looked at this one: http://www.json.org/java/index.html
 
 This seems to be an easy-to-understand-implementation.
 
 However, I am wondering how to determine whether a SolrDocument's field 
 is multiValued or not.
 The JSONResponseWriter of Solr looks at the schema-configuration. 
 However, the client shouldn't do that.
 How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


 
 Thanks for sharing ideas.
 
 - Mitch
 
 
 Am 28.07.2010 15:35, schrieb Chantal Ackermann:
  You could use org.apache.solr.handler.JsonLoader.
  That one uses org.apache.noggit.JSONParser internally.
  I've used the JacksonParser with Spring.
 
  http://json.org/ lists parsers for different programming languages.
 
  Cheers,
  Chantal
 
  On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
 
  Hello ,
 
  Second try to send a mail to the mailing list...
 
  I need to translate SolrJ's response into JSON-response.
  I can not query Solr directly, because I need to do some math with the
  responsed data, before I show the results to the client.
 
  Any experiences how to translate SolrJ's response into JSON without writing
  your own JSON Writer?
 
  Thank you.
  - Mitch
   
 
 
 





Re: logic required for newbie

2010-07-28 Thread Jonty Rhods
Hi

thanks for reply..
 Actually requirement is diffrent (sorry if I am unable to clerify in first
mail).

basically follwoing are the fields name in schema as well:
 1. id
 2. name
 3. user_id
 4. location
 5. country
 6. landmark1
 7. landmark2
 8. landmark3
 9. landmark4
 10. landmark5

which carrying text...
for example:

id1/id
namesome name/name
user_iduser_id/user_id
locationnew york/location
countryUSA/country
landmark15th avenue/landmark1
landmark2ms departmental store/landmark2
landmark3base bakery/landmark3
landmark4piza hut/landmark4
landmark5ford motor/landmark5

now if user search by piza then expected result like:

id1/id
namesome name/name
user_iduser_id/user_id
locationnew york/location
countryUSA/country
landmark4piza hut/landmark4

it means I want to ignore all other landmark which not match. By filter we
can filter the fields but here I dont know the
the field name because it depends on text match.

is there any other solution.. I am ready to change in schema or in logic. I
am using solrj.

please help me I stuck here..

with regards


On Wed, Jul 28, 2010 at 7:22 PM, rajini maski rajinima...@gmail.com wrote:

 you can index each of these field separately...
 field1- Id
 field2- name
 field3-user_id
 field4-country.

 
 field7- landmark

 While quering  you can specify  q=Landmark9 This will return you
 results..
 And if you want only particular fields in output.. use the fl parameter
 in
 query...

 like

 http://localhost:8090/solr/select?
 indent=onq=landmark9fl=ID,user_id,country,landmark

 This will give your desired solution..




 On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com
 wrote:

  Hi All,
 
  I am very new and learning solr.
 
  I have 10 column like following in table
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  6. landmark1
  7. landmark2
  8. landmark3
  9. landmark4
  10. landmark5
 
  when user search for landmark then  I want to return only one landmark
  which
  match. Rest of the landmark should ingnored..
  expected result like following if user search by landmark2..
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  7. landmark2
 
  or if search by landmark9
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  9. landmark9
 
 
  please help me to design the schema for this kind of requirement...
 
  thanks
  with regards
 



Re: display solr result in JSP

2010-07-28 Thread Ranveer

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).

hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

   




Re: simple question from a newbie

2010-07-28 Thread Ranveer
I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.


regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:

Hi,



I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example



I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention





I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention



I want something like dc.title=c* type query



I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]





Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329




   




RE: Indexing Problem: Where's my data?

2010-07-28 Thread Michael Griffiths
Thanks - but my schema.xml is not recognizing field names specified in the 
data-config.xml.

For example - and I just tested this now - if I have in my data-config.xml:

field column=product_id name=pid /

And then in my schema.xml:

field name=pid type=int indexed=true stored=true required=true /

Then no documents are processed (e.g. I get rows queried, but str name=Total 
Documents Processed0/str in the data handler UI).

But if I change that to:

field name=product_id type=int indexed=true stored=true 
required=true /

... now documents are processed (e.g. str name=Total Documents 
Processed313/str).

Which, quite frankly, confuses me. I may be doing something else wrong (I 
changed my SQL as well, so I'm getting another failure, but I think it's 
separate to this one).

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, July 27, 2010 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing Problem: Where's my data?

Solr respects case for field names.  Database fields are supplied in 
lower-case, so it should be 'attribute_name' and 'string_value'. Also 
'product_id', etc.

It is easier if you carefully emulate every detail in the examples, for example 
lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc ken.fos...@realestate.com wrote:

 for STRING_VALUE, I assume there is a property in the 'select *' 
 results called string_value? if so I'm not sure why it wouldn't work. 
 If not, then that's why, it doesn't have anything to put there.

 For ATTRIBUTE_NAME, is it possibly a case issue? you called it 
 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your 
 schema...just something to check I guess.

 Also, not sure why you are using name= in your fields, for example, 
 field column=PARENT_FAMILY name=Parent Family / I thought 
 'column' was the source field name and 'name' was supposed to be the 
 schema field name and if not there it would assume 'column' name. You 
 don't have a schema field called Parent Family so it looks like it's 
 defaulting to column name too which is lucky for you I suppose. But 
 you may want to either remove 'name=' or make it match the schema. 
 (and I may be completely wrong on this, it's been a while since I got DIH 
 going).


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp
 1000660p1000843.html Sent from the Solr - User mailing list archive at 
 Nabble.com.




--
Lance Norskog
goks...@gmail.com




Re: Spellchecking and frequency

2010-07-28 Thread Jonathan Rochkind



I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.


This is interesting to me. I also have not been that happy with standard 
solr spellcheck. 

In addition to possibly filing a JIRA for future fix to Solr itself, 
another option would be you could make your 'alternate' SpellCheck 
component available as a seperate .jar, so anyone could use it just by 
installing and specifying it in their solrconfig.xml.  I would encourage 
you to consider that, not as a replacement for suggesting a patch to 
Solr itself, but so people can use your improved spellchecker 
immediately, without waiting for possible Solr patches.


Jonathan



Re: Is there a cache for a query?

2010-07-28 Thread Moazzam Khan
As far as I know all searches get cache at least for some time. I am
not sure about field collapse results being cached.

- Moazzam
http://moazzam-khan.com



On Mon, Jul 26, 2010 at 9:48 PM, Li Li fancye...@gmail.com wrote:
 I want a cache to cache all result of a query(all steps including
 collapse, highlight and facet).  I read
 http://wiki.apache.org/solr/SolrCaching, but can't find a global
 cache. Maybe I can use external cache to store key-value. Is there any
 one in solr?



Re: SolrJ Response + JSON

2010-07-28 Thread MitchK

Hi Chantal,

thank you for the feedback.
I did not see the wood for the trees!
The SolrDocument's javadoc says the following: 
http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html


|*getFieldValue 
../../../../org/apache/solr/common/SolrDocument.html#getFieldValue%28java.lang.String%29*(String 
http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true name)| 


  Get the value or collection of values for a given field.

The magical word here is that little or :-).

I will try that tomorrow and give you a feedback!


Are you sure that you cannot change the SOLR results at query time
according to your needs?


Unfortunately, it is not possible in this case.

Kind regards,
Mitch


Am 28.07.2010 16:49, schrieb Chantal Ackermann:

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
   

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration.
However, the client shouldn't do that.
How did you solved that problem?
 

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


   

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:
 

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:

   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 



   




   




RE: simple question from a newbie

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)
I think I got it to work.  If I do a wildcard search using the dc3.title
field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
every title that has a word in it that starts with 'c', which isn't
exactly what I wanted.  I'm guessing it's because of the
type=caseInsensitiveSort.  

Well, here is my schema for reference.  Thanks for your help.


- schema name=example version=1.1
- types
  fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true / 
- !--  boolean type: true or false 
  -- 
  fieldType name=boolean class=solr.BoolField
sortMissingLast=true omitNorms=true / 
  fieldType name=integer class=solr.IntField omitNorms=true / 
  fieldType name=long class=solr.LongField omitNorms=true / 
  fieldType name=float class=solr.FloatField omitNorms=true / 
  fieldType name=double class=solr.DoubleField omitNorms=true / 
  fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true / 
  fieldType name=slong class=solr.SortableLongField
sortMissingLast=true omitNorms=true / 
  fieldType name=sfloat class=solr.SortableFloatField
sortMissingLast=true omitNorms=true / 
  fieldType name=sdouble class=solr.SortableDoubleField
sortMissingLast=true omitNorms=true / 
  fieldType name=date class=solr.DateField sortMissingLast=true
omitNorms=true / 
- fieldType name=text_ws class=solr.TextField
positionIncrementGap=100
- analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  /analyzer
  /fieldType
- fieldType name=text class=solr.TextField
positionIncrementGap=100
- analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
- analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
  /fieldType
- fieldType name=textTight class=solr.TextField
positionIncrementGap=100
- analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
generateNumberParts=0 catenateWords=1 catenateNumbers=1
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
  /fieldType
- fieldType name=caseInsensitiveSort class=solr.TextField
sortMissingLast=true omitNorms=true
- analyzer
  tokenizer class=solr.KeywordTokenizerFactory / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.TrimFilterFactory / 
  /analyzer
  /fieldType
  fieldtype name=ignored stored=false indexed=false
class=solr.StrField / 
  /types
- fields
- !--  Fedora specific fields 
  -- 
  field name=PID type=string indexed=true stored=true / 
  field name=fgs.state type=string indexed=true stored=true / 
  field name=fgs.label type=text indexed=true stored=true / 
  field name=fgs.ownerId type=string indexed=true stored=true
/ 
  field name=fgs.createdDate type=date indexed=true stored=true
/ 
  field name=fgs.lastModifiedDate type=date indexed=true
stored=true / 
  field name=fgs.contentModel type=string indexed=true
stored=true / 
  field name=fgs.type type=string indexed=true stored=true
multiValued=true / 
- !--  DC Fields 
  -- 
  field name=dc.contributor type=text indexed=true stored=true
multiValued=true / 
  field name=dc.coverage type=text indexed=true stored=true
multiValued=true / 
  field name=dc.creator type=text indexed=true stored=true
multiValued=true / 
  field name=dc.date type=text indexed=true stored=true
multiValued=true / 
  field name=dc.description type=text indexed=true stored=true
multiValued=true / 
  field name=dc.format type=text indexed=true stored=true
multiValued=true / 
  field name=dc.identifier type=text indexed=true stored=true
multiValued=true / 
  field name=dc.language type=text indexed=true stored=true
multiValued=true / 
  field name=dc.publisher type=text indexed=true stored=true
multiValued=true / 
  field name=dc.relation type=text indexed=true 

Solr 1.4.1 field collapse

2010-07-28 Thread Moazzam Khan
Hi guys,

I read somewhere that Solr 1.4.1 has field collapse support by default
(without patching it) but I haven't been able to confirm it. Is this
true?

- Moazzam


Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali

Well I do have disk limitations too, and thats why I think slave nodes died,
when replicating data from master node. (as it was just adding on top of
existing index files).

:: What do you mean here? Optimizing is too CPU expensive? 

What I meant by avoid playing around with slave nodes is that doing anything
(including optimization on slave nodes) that may effect the live search
performance, unless I have no option.

:: Do you mean increase to double size? 

yes, as it did before on replication. But I didn't get a chance to run the
indexer yesterday. 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali

 In solrconfig.xml, these two lines control that. Maybe they need to be
increased.
 str name=httpConnTimeout5000/str
 str name=httpReadTimeout1/str 

Where do I add those in solrconfig? These lines doesn't seem to be present
in the example solrconfig file...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002432.html
Sent from the Solr - User mailing list archive at Nabble.com.


How do NOT queries work?

2010-07-28 Thread Kaan Meralan
I wonder how do NOT queries work. Is it a pass on the result set and
filtering out the NOT property or something like that?

Also is there anybody who does some performance checks on NOT queries? I
want to know whether there is a significant performance degradation or not
when you have NOT in a query.

Thanks...

//kaan


RE: display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply. I don't have much experience at JSP. I found tag 
library, and am trying to use  xsltlib:apply xml=%= 
url.getContent().toString() % xsl=/xsl/result.xsl/ . Unfortunately I 
didn't get it work. 

Would you please give me more information? I really appreciate your help!

Thanks,
Xiaohui 

-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: display solr result in JSP

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).
hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 I am new for solr. Just got example xml file index and search by following 
 solr tutorial. I wonder how I can get the search result display in a JSP. I 
 really appreciate any suggestions you can give.

 Thanks so much,
 Xiaohui





Re: Total number of terms in an index?

2010-07-28 Thread Jason Rutherglen
Tom,

The total number of terms... Ah well, not a big deal, however yes the
flex branch does expose this so we can show this in Solr at some
point, hopefully outside of Solr's Luke impl.

On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom tburt...@umich.edu wrote:
 Hi Jason,

 Are you looking for the total number of unique terms or total number of term 
 occurrences?

 Checkindex reports both, but does a bunch of other work so is probably not 
 the fastest.

 If you are looking for total number of term occurrences, you might look at 
 contrib/org/apache/lucene/misc/HighFreqTerms.java.

 If you are just looking for the total number of unique terms, I wonder if 
 there is some low level API that would allow you to just access the in-memory 
 representation of the tii file and then multiply the number of terms in it by 
 your indexDivisor (default 128). I haven't dug in to the code so I don't 
 actually know how the tii file gets loaded into a data structure in memory.  
 If there is api access, it seems like this might be the quickest way to get 
 the number of unique terms.  (Of course you would have to do this for each 
 segment).

 Tom
 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Monday, July 26, 2010 8:39 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Total number of terms in an index?


 : Sorry, like the subject, I mean the total number of terms.

 it's not stored anywhere, so the only way to fetch it is to actually
 iteate all of the terms and count them (that's why LukeRequestHandler is
 slow slow to compute this particular value)

 If i remember right, someone mentioned at one point that flex would let
 you store data about stuff like this in your index as part of the segment
 writing, but frankly i'm still not sure how that iwll help -- because you
 unless your index is fully optimized, you still have to iterate the terms
 in each segment to 'de-dup' them.


 -Hoss




RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread David Thibault
Tommasso,

I used your patch and tried it with the 1.4.1 solr.war from a fresh 1.4.1 
distribution, and it still gave me that NoSuchMethodError.  However, when I 
tried it with the newly-patched-and-compiled apache-solr-1.4.2-dev.war file it 
works.  I think I tried that before and it didn't work. 

In any case, thanks for the patch and the advice.  Looks like now it's working 
for me.

Best,
Dave




-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault dthiba...@esperion.com

 Alessandro  all,

 I was having the same issue with Tika crashing on certain PDFs.  I also
 noticed the bug where no content was extracted after upgrading Tika.

 When I went to the SOLR issue you link to below, I applied all the patches,
 downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
 got the following error:
 SEVERE: java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
 at
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
 at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
 at java.lang.Thread.run(Thread.java:619)

 This is really weird because I DID apply the SolrResourceLoader patch that
 adds the getClassLoader method.  I even verified by going opening up the
 JARs and looking at the class file in Eclipse...I can see the
 SolrResourceLoader.getClassLoader() method.

 Does anyone know why it can't find the method?  After patching the source I
 did ant clean dist in the base directory of the Solr source tree and
 everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
 the jars from dist/ and all the library dependencies from
 contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
 the logs looked good.

 I'm stumped.  It would be very nice to have a Solr implementation using the
 newest versions of PDFBox  Tika and actually have content being
 extracted...=)

 Best,
 Dave


 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Tuesday, July 27, 2010 6:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 Hi Jon,
 During the last days we front the same problem.
 Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
 content and from others, Solr throws an exception during the Indexing
 Process .
 You must:
 Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
 snapshot and tika-parsers 0.8.
 Update PdfBox and all related libraries.
 After that You have to patch Solr 1.4.1 following this patch :

 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
 This is the firts way to solve the problem.

 Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
 is
 thrown during the Indexing process, but no content is extracted.
 Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  

Re: Total number of terms in an index?

2010-07-28 Thread Jonathan Rochkind
At first I was thinking the TermsComponent might give you this, but 
oddly it seems not to.


http://wiki.apache.org/solr/TermsComponent




RE: How to 'filter' facet results

2010-07-28 Thread Nagelberg, Kallin
ManBearPig is still a threat.

-Kallin Nagelberg

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Tuesday, July 27, 2010 7:44 PM
To: solr-user@lucene.apache.org
Subject: RE: How to 'filter' facet results

 Is there a way to tell Solr to only return a specific set of facet values?  I
 feel like the facet query must be able to do this, but I'm not really
 understanding the facet query.  In my specific case, I'd like to only see 
 facet
 values for the same values I pass in as query filters, i.e. if I run this 
 query:
fq=keyword:man OR keyword:bear OR keyword:pig
facet=on
facet.field:keyword

 then I only want it to return the facet counts for man, bear, and pig.  The
 resulting docs might have a number of different values for keyword, in 
 addition

For the general case of filtering facet values, I've wanted to do that too in 
more complex situations, and there is no good way I've found. 

For your very specific use case though, yeah, you can do it with facet.query.  
Leave out the facet.field, but instead:

facet.query=keyword:man
facet.query=keyword:bear
facet.query=keyword:pig

You'll get three facet.query results in the response, one each for man, bear, 
pig. 

Solr behind the scenes will kind of do three seperate 'sub-queries', one for 
each facet.query, but since the query itself should be cached, you shouldn't 
notice much difference. Especially if you have a warming query that facets on 
the keyword field (I'm never entirely sure when caches created by warming 
queries will be used by a facet.query, or if it depends on the facet method in 
use, but it can't hurt). 

Jonathan



Problem with field collapsing

2010-07-28 Thread Moazzam Khan
Hi All,

Whenever I use field collapse, the numFound attribute contains
exactly as many rows as I put in rows parameter instead of returning
total number of documents that matched the query. Is there a way to
rectify this?

Thanks,

Moazzam


Re: SolrCore has a large number of SolrIndexSearchers retained in infoRegistry

2010-07-28 Thread skommuri

Hi,

It didn't seem like it improved the situation. The same exception stack
traces are found. 

I have explicitly defined the index readers to be reopened by specifying in
the solrconfig.xml

The exception occurs when the remote cores are being searched. I am
attaching the exceptions in a text file for reference. 
http://lucene.472066.n3.nabble.com/file/n1002926/solrexceptions.txt
solrexceptions.txt 

Couple of notes:

1. QueryComponent#process
 Is requesting for a SolrIndexSearcher twice by calling
SolrQueryRequest#getSearcher() but is never being closed. I see several
instances where getSearcher is being called but is never being properly
closed - performing a quick call heirarchy of SolrQueryRequest#getSearcher()
and SolrQueryRequest#close() will illustrate this point.

2. It may be the case that this exception was never encountered because
typical deployments are not heavily using Distributed Search across multiple
Solr Cores and/or it's a small memory leak and so never noticed?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCore-has-a-large-number-of-SolrIndexSearchers-retained-in-infoRegistry-tp483900p1002926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr to perform range queries in Dspace

2010-07-28 Thread Chris Hostetter

: I'm trying to use dspace to search across a range of index created and stored
: using Dsindexer.java class. I have seen where Solr can be use to perform

I've never headr of Dsindexer.java but since this is hte first result 
google returns...

http://scm.dspace.org/trac/dspace/browser/trunk/dspace/src/org/dspace/search/DSIndexer.java?rev=970

...i'm going to assume that's what you are talking about.

: numerical range queries using either TrieIntField,
: TrieDoubleField,TrieLongField, etc.. classes defined in Solr's api or 
: SortableIntField.java, SortableLongField,SortableDoubleField.java. I would
: like to know how to implement these classes in Dspace so that I can be able
: to perform numerical range queries. Any help would be greatly apprciated.

i *think* what you are asking is how to use Solr to search the numeric 
fields in an existing Lucene index (created by the above mentioned java 
code) -- but i may be wrong (your choice of wording implement these 
classes in Dspace is very perplexing to me).

If i'm understanding correctly, then the key to the issue is all in how 
the numeric values are indexed as lucene Fields in your existing code -- 
but in the copy of DSIndexer.java i found, there are no numeric fields, 
just Text fields.  If you are indexing the numeric values as simple 
strings, then in Solr you would want to refer to them using hte legacy 
IntField, FloatField, etc... these assume simple string 
representations, and will sort properly using the numeric FieldCache -- 
BUT! -- range queries won't work.  Range queries require that the indexed 
terms be a logical ordering which isn't true for simple string 
representations of numbers (100 is lexigraphically before 2).

If i actually have your question backwards -- if what you are asiking is 
how to modify the DSIndexer.java class to index fields in the same way as 
TrieDoubleField,TrieLongField,SortableIntField, etc... then the 
answer is much simpler: all FieldType's in Solr implement toInternal and 
toExternal methods ... the toInternal is what you need to call to encode 
your simple numeric values into the format to be indexed -- toExternal (or 
toObject) is how you cna get the original value back out.

For the Trie fields, these actually just use some utilities in Lucnee, 
so you could look at the code and use the same utilities w/o ever needing 
any Solr source code.

If i've completley missunderstood your question, plese post a followup 
explaining in more detail what it is you are trying to accomplish.

-Hoss



Know which terms are in a document

2010-07-28 Thread Max Lynch
I would like to be search against my index, and then *know* which of a set
of given terms were found in each document.

For example, let's say I want to show articles with the word pizza or
cake in them, but would like to be able to say which of those two was
found.  I might use this to handle the article differently if it is about
pizza, or if it is about cake.  I understand I can do multiple queries but I
would like to avoid that.

One thought I had was to use a highlighter and only return a fragment with
the highlighted word, but I'm not sure how to do this with the various
highlighting options.

Is there a way?

Thanks.


Re: Show elevated Result Differently

2010-07-28 Thread Erick Erickson
Please expand on what this means, it's quite vague. You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, Jul 28, 2010 at 8:43 AM, Vishal.Arora vis...@value-one.com wrote:


 I want to show elevated Result Different from others is there any way to do
 this
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: simple question from a newbie

2010-07-28 Thread Erick Erickson
What is the query you submit (don't forget debugQuery=on? In particular,
what
field are you sorting on?

But yes, if you're searching on a tokenized field, you'll get matches on all
tokens
in that field. Which are probably single words. And no matter how you sort,
you're
still getting documents where the whole title doesn't start with c in your
title.

What happens if you search on your dc3.title instead? It uses the keyword
tokenizer
which tokenizes the entire title as a single token. Sort by that one too.

Best
Erick

On Wed, Jul 28, 2010 at 12:26 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR) 
v...@cdc.gov wrote:

 I think I got it to work.  If I do a wildcard search using the dc3.title
 field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
 every title that has a word in it that starts with 'c', which isn't
 exactly what I wanted.  I'm guessing it's because of the
 type=caseInsensitiveSort.

 Well, here is my schema for reference.  Thanks for your help.


 - schema name=example version=1.1
 - types
   fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true /
 - !--  boolean type: true or false
  --
   fieldType name=boolean class=solr.BoolField
 sortMissingLast=true omitNorms=true /
   fieldType name=integer class=solr.IntField omitNorms=true /
  fieldType name=long class=solr.LongField omitNorms=true /
  fieldType name=float class=solr.FloatField omitNorms=true /
  fieldType name=double class=solr.DoubleField omitNorms=true /
   fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true omitNorms=true /
  fieldType name=slong class=solr.SortableLongField
 sortMissingLast=true omitNorms=true /
  fieldType name=sfloat class=solr.SortableFloatField
 sortMissingLast=true omitNorms=true /
  fieldType name=sdouble class=solr.SortableDoubleField
 sortMissingLast=true omitNorms=true /
   fieldType name=date class=solr.DateField sortMissingLast=true
 omitNorms=true /
 - fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
 - analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory /
  /analyzer
  /fieldType
 - fieldType name=text class=solr.TextField
 positionIncrementGap=100
 - analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
   /analyzer
 - analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  /fieldType
 - fieldType name=textTight class=solr.TextField
 positionIncrementGap=100
 - analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
 generateNumberParts=0 catenateWords=1 catenateNumbers=1
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
   filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  /fieldType
 - fieldType name=caseInsensitiveSort class=solr.TextField
 sortMissingLast=true omitNorms=true
 - analyzer
  tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.TrimFilterFactory /
   /analyzer
  /fieldType
  fieldtype name=ignored stored=false indexed=false
 class=solr.StrField /
  /types
 - fields
 - !--  Fedora specific fields
  --
  field name=PID type=string indexed=true stored=true /
  field name=fgs.state type=string indexed=true stored=true /
  field name=fgs.label type=text indexed=true stored=true /
  field name=fgs.ownerId type=string indexed=true stored=true
 /
  field name=fgs.createdDate type=date indexed=true stored=true
 /
  field name=fgs.lastModifiedDate type=date indexed=true
 stored=true /
  field name=fgs.contentModel type=string indexed=true
 stored=true /
  field name=fgs.type type=string indexed=true stored=true
 multiValued=true /
 - !--  DC Fields
  --
  field name=dc.contributor type=text indexed=true stored=true
 multiValued=true /
  field name=dc.coverage type=text indexed=true stored=true
 

Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread dc tech
1,500 threads seems extreme by any standards so there is something
happening in your install. Even with appservers for web apps,
typically 100 would be a fair # of threads.


On 7/28/10, Christos Constantinou ch...@simpleweb.co.uk wrote:
 Hi,

 Solr seems to be crashing after a JVM exception that new threads cannot be
 created. I am writing in hope of advice from someone that has experienced
 this before. The exception that is causing the problem is:

 Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create
 new native thread

 The memory that is allocated to Solr is 3072MB, which should be enough
 memory for a ~6GB data set. The documents are not big either, they have
 around 10 fields of which only one stores large text ranging between 1k-50k.

 The top command at the time of the crash shows Solr using around 1500
 threads, which I assume it is not normal. Could it be that the threads are
 crashing one by one and new ones are created to cope with the queries?

 In the log file, right after the the exception, there are several thousand
 commits before the server stalls completely. Normally, the log file would
 report 20-30 document existence queries per second, then 1 commit per 5-30
 seconds, and some more infrequent faceted document searches on the data.
 However after the exception, there are only commits until the end of the log
 file.

 I am wondering if anyone has experienced this before or if it is some sort
 of known bug from Solr 1.4? Is there a way to increase the details of the
 exception in the logfile?

 I am attaching the output of a grep Exception command on the logfile.

 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of 

Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread Erick Erickson
Your commits are very suspect. How often are you making changes to your
index?
Do you have autocommit on? Do you commit when updating each document?
Committing
too often and consequently firing off warmup queries is the first place I'd
look. But I
agree with dc tech, 1,500 is wy more than I would expect.

Best
Erick



On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou 
ch...@simpleweb.co.uk wrote:

 Hi,

 Solr seems to be crashing after a JVM exception that new threads cannot be
 created. I am writing in hope of advice from someone that has experienced
 this before. The exception that is causing the problem is:

 Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to
 create new native thread

 The memory that is allocated to Solr is 3072MB, which should be enough
 memory for a ~6GB data set. The documents are not big either, they have
 around 10 fields of which only one stores large text ranging between 1k-50k.

 The top command at the time of the crash shows Solr using around 1500
 threads, which I assume it is not normal. Could it be that the threads are
 crashing one by one and new ones are created to cope with the queries?

 In the log file, right after the the exception, there are several thousand
 commits before the server stalls completely. Normally, the log file would
 report 20-30 document existence queries per second, then 1 commit per 5-30
 seconds, and some more infrequent faceted document searches on the data.
 However after the exception, there are only commits until the end of the log
 file.

 I am wondering if anyone has experienced this before or if it is some sort
 of known bug from Solr 1.4? Is there a way to increase the details of the
 exception in the logfile?

 I am attaching the output of a grep Exception command on the logfile.

 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
 exceeded limit of maxWarmingSearchers=2, try 

Re: WordDelimiterFilter and phrase queries?

2010-07-28 Thread Chris Hostetter
: pos token offset
: 1 3 0-1
: 2 diphenyl 2-10
: 3 propanoic 11-20
: 3 diphenylpropanoic 2-20

: Say someone enters the query string 3-diphenylpropanoic
: 
: The query parser I'm using transforms this into a phrase query and the
: indexed form is missed because based the positions of the terms '3'
: and 'diphenylpropanoic' indicate they are not adjacent?
: 
: Is this intended behavior? I expect that the catenated word
: 'diphenylpropanoic' should have a position of 2 based on the position
: of the first term in the concatenation, but perhaps I'm missing

I believe this is correct, but i'm not certain for hte reason - i think 
it's just an implementation detail.  Consider the ooposite scenerio: if 
your indexed text was diphenyl-propanoic-3 and things worked the way 
you are suggesting they should, the term diphenylpropanoic 
would up at position 1 (with diphenyl) and diphenylpropanoic-3 would not 
match because then the terms wouldn't be adjacent.

damned if you do, damned if you don't

typically for fields whwere you are using WDF with the concat options 
you would usually use a bit of slop on the generated phrase queries to 
allow for the loosenes of the position information.  (in an ideal world, 
the token strem wouldn't have monotomic integer positions, it would be 
a DAG, and then these things would be easily represented, but that's 
pretty non-trivial to do with the internals.


-Hoss



Re: Scoring Search for autocomplete

2010-07-28 Thread Chris Hostetter

You weren't really clear on how you are generating your autocomplete 
results -- ie: via TermsComponent on your main index? or via a 
search on a custom index where each document is a word to suggested?

Assuming the later, then the approach you describe below sounds good to 
me, but it doesn't seem like it would really make sense for hte former.


: Hi, I have an autocomplete that is currently working with an
: NGramTokenizer so if I search for Yo both New York and Toyota
: are valid results.  However I'm trying to figure out how to best
: implement the search so that from a score perspective if the string
: matches the beginning of an entire field it ranks first, followed by
: the beginning of a term and then in the middle of a term.  For example
: if I was searching with vi I would want Virginia ahead of West
: Virginia ahead of Five.
: 
: I think I can do this with three seperate fields, one using a white
: space tokenizer and a ngram filter, another using the edge-ngram +
: whitespace and another using keyword+edge-ngram, then doing an or on
: the 3 fields, so that Virginia would match all 3 and get a higher
: score... but this doesn't feel right to me, so I wanted to check for
: better options.
: 
: Thanks.
: 



-Hoss



Help with schema design

2010-07-28 Thread Pramod Goyal
Hi,
I have a use case where i get a document and a list of events that has
happened on the document. For example

First document:
 Some text content
Events:
  Event TypeEvent By Event Time
  Update  Pramod  06062010 2:30:00
  Update  Raj 06062010 2:30:00
  View Rahul  07062010 1:30:00


I would like to support queries like get all document Event Type = ? and
Event time greater than ? ,  also query like get all the documents Updated
by Pramod.
How should i design my schema to support this use case.

Thanks,
Regards,
Pramod Goyal


Is solr able to merge index on different nodes

2010-07-28 Thread Chengyang
Once I want to create a large index, can I split the index on different nodes 
and the merge all the indexs to one node.
Any further suggestion for this case?


Re: logic required for newbie

2010-07-28 Thread rajini maski
First of all I hope that in schema you have mentioned for fields
indexed=true and stored=true...
Next if you have done so... and now just search as q=landmark:piza... you
will get one result set only..

Note : There is one constraint about applying analyzers and tokenizers... IF
you apply white space tokenizer...that is , data type=text_ws. The only
you will get result set of piza hut even when you query for piza... If no
tokenizer applied..You  will not get it...
I hope this was needed reply..If something elseyou can easy question..;)


On Wed, Jul 28, 2010 at 8:42 PM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi

 thanks for reply..
  Actually requirement is diffrent (sorry if I am unable to clerify in first
 mail).

 basically follwoing are the fields name in schema as well:
  1. id
  2. name
  3. user_id
  4. location
  5. country
  6. landmark1
  7. landmark2
  8. landmark3
  9. landmark4
  10. landmark5

 which carrying text...
 for example:

 id1/id
 namesome name/name
 user_iduser_id/user_id
 locationnew york/location
 countryUSA/country
 landmark15th avenue/landmark1
 landmark2ms departmental store/landmark2
 landmark3base bakery/landmark3
 landmark4piza hut/landmark4
 landmark5ford motor/landmark5

 now if user search by piza then expected result like:

 id1/id
 namesome name/name
 user_iduser_id/user_id
 locationnew york/location
 countryUSA/country
 landmark4piza hut/landmark4

 it means I want to ignore all other landmark which not match. By filter we
 can filter the fields but here I dont know the
 the field name because it depends on text match.

 is there any other solution.. I am ready to change in schema or in logic. I
 am using solrj.

 please help me I stuck here..

 with regards


 On Wed, Jul 28, 2010 at 7:22 PM, rajini maski rajinima...@gmail.com
 wrote:

  you can index each of these field separately...
  field1- Id
  field2- name
  field3-user_id
  field4-country.
 
  
  field7- landmark
 
  While quering  you can specify  q=Landmark9 This will return you
  results..
  And if you want only particular fields in output.. use the fl parameter
  in
  query...
 
  like
 
  http://localhost:8090/solr/select?
  indent=onq=landmark9fl=ID,user_id,country,landmark
 
  This will give your desired solution..
 
 
 
 
  On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com
  wrote:
 
   Hi All,
  
   I am very new and learning solr.
  
   I have 10 column like following in table
  
   1. id
   2. name
   3. user_id
   4. location
   5. country
   6. landmark1
   7. landmark2
   8. landmark3
   9. landmark4
   10. landmark5
  
   when user search for landmark then  I want to return only one landmark
   which
   match. Rest of the landmark should ingnored..
   expected result like following if user search by landmark2..
  
   1. id
   2. name
   3. user_id
   4. location
   5. country
   7. landmark2
  
   or if search by landmark9
  
   1. id
   2. name
   3. user_id
   4. location
   5. country
   9. landmark9
  
  
   please help me to design the schema for this kind of requirement...
  
   thanks
   with regards
  
 



Re: SolrJ Response + JSON

2010-07-28 Thread rajini maski
Yeah right... This query will do it

http://localhost:8090/solr/select/?q=*:*version=2.2start=0rows=10indent=onwt=json

This will do your work... This is more liike using xsl transformation
supported by solr..:)

Regards,
Rajani Maski


On Wed, Jul 28, 2010 at 6:24 PM, Mark Allan mark.al...@ed.ac.uk wrote:

 I think you should just be able to add wt=json to the end of your query
 (or change whatever the existing wt parameter is in your URL).

 Mark


 On 28 Jul 2010, at 12:54 pm, MitchK wrote:


 Hello community,

 I need to transform SolrJ - responses into JSON, after some computing on
 those results by another application has finished.

 I can not do those computations on the Solr - side.

 So, I really have to translate SolrJ's output into JSON.

 Any experiences how to do so without writing your own JSON-writer?

 Thank you.
 - Mitch
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.