date:20100728

make sure to set stored=true on every field you expect to be returned
in your results for later display.

Chantal

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

Hi Lance!

On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote:
 Should this go into the trunk, or does it only solve problems unique
 to your use case?

The solution is generic but is an extension of XPathEntityProcessor
because I didn't want to touch the solr.war. This way I can deploy the
extension into SOLR_HOME/lib.
The problem that it solves is not one with XPathEntityProcessor but more
general. What it does:

It adds an attribute to the entity that I called skipIfEmpty which
takes the variable (it could even take more variables seperated by
whitespace).
On entityProcessor.init() which is called for sub-entities per row of
root entity (:= before every new request to the data source), the value
of the attribute is resolved and if it is null or empty (after
trimming), the entity is not further processed.
This attribute is only allowed on sub-entities.

It would probably be nicer to put that somewhere higher up in the class
hierarchy so that all entity processors could make use of it.
But I don't know how common the use case is - all examples I found where
more or less joins on primary keys.

Cheers,
Chantal

Here comes the code==

import static
org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE;

import java.util.Map;
import java.util.logging.Logger;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImportHandlerException;
import org.apache.solr.handler.dataimport.XPathEntityProcessor;

public class OptionalXPathEntityProcessor extends XPathEntityProcessor {
private Logger log =
Logger.getLogger(OptionalXPathEntityProcessor.class.getName());
private static final String SKIP_IF_EMPTY = skipIfEmpty;
private boolean skip = false;

@Override
protected void firstInit(Context context) {
if (context.isRootEntity()) {
throw new DataImportHandlerException(SEVERE,
OptionalXPathEntityProcessor not allowed for root entities.);
}
super.firstInit(context);
}

@Override
public void init(Context context) {
String value = 
context.getResolvedEntityAttribute(SKIP_IF_EMPTY);
if (value == null || value.trim().isEmpty()) {
skip = true;
} else {
super.init(context);
skip = false;
}
}

@Override
public MapString, Object nextRow() {
if (skip) return null;
return super.nextRow();
}
}

Solr using 1500 threads - is that normal?

2010-07-28 Thread Christos Constantinou

Hi,

Solr seems to be crashing after a JVM exception that new threads cannot be 
created. I am writing in hope of advice from someone that has experienced this 
before. The exception that is causing the problem is:

Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create 
new native thread

The memory that is allocated to Solr is 3072MB, which should be enough memory 
for a ~6GB data set. The documents are not big either, they have around 10 
fields of which only one stores large text ranging between 1k-50k.

The top command at the time of the crash shows Solr using around 1500 threads, 
which I assume it is not normal. Could it be that the threads are crashing one 
by one and new ones are created to cope with the queries?

In the log file, right after the the exception, there are several thousand 
commits before the server stalls completely. Normally, the log file would 
report 20-30 document existence queries per second, then 1 commit per 5-30 
seconds, and some more infrequent faceted document searches on the data. 
However after the exception, there are only commits until the end of the log 
file.

I am wondering if anyone has experienced this before or if it is some sort of 
known bug from Solr 1.4? Is there a way to increase the details of the 
exception in the logfile?

I am attaching the output of a grep Exception command on the logfile.

Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. 
exceeded limit of maxWarmingSearchers=2, try again later.
Jul 28, 2010 8:55:17 AM

Re: Strange search

2010-07-28 Thread stockii


try to delete solr.SnowballPorterFilterFactory from your analyzerchain. i
had similar problems by using german  SnowballPorterFilterFactory
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1001990.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrJ Response + JSON


Hello community,

I need to transform SolrJ - responses into JSON, after some computing on
those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.

Get unique values

2010-07-28 Thread Rafal Bluszcz Zawadzki

Hi,

In my schema I have (inter ali) fields CollectionID, and CollectionName.
 These two values always match together, which means that for every value of
CollectionID there is matching value from CollectionName.

I am interested in query which allow me to get unique values of CollectionID
with matching CollectionNames (rest of fields is not interested for me in
this query).

I was thinking about facets, but they offer a bit more than I need.

Anyone has idea for query which allow me to get these results?

Cheers,

-- 
Rafał Zawadzki
http://dev.bluszcz.net

Highlighted match snippets highlight non-matched words (such as 0.1 and 0.2)

2010-07-28 Thread Jon Cram

Hi,

 

I'm observing some strange highlighted words in field value snippets
returned from Solr when matched term highlighting
(http://wiki.apache.org/solr/HighlightingParameters) is enabled.

 

In some cases, highlighted field value snippets contain highlighted
words that are not matches:

-  this appears to be in addition to highlighting words that are
matches

-  these non-match highlighted words are not pre-highlighted in
the indexed content

-  I've determined these are non-matches by appending
debugQuery=1 to the URL and examining the match detail information

 

I've so far observed this in relation to the strings 0, 0.1, 0.2
and 0.4 in indexed content.

 

Real life example when searching for [gas]:

 

Relevant matched document result from Solr:

doc

str name=description

EXAMPLE prepares an extensive range of traceable calibration gas
standards with guaranteed relative uncertainties levels of 0.1% for
certain species (PDF 676 KB).

/str

/doc

 

Related highlighted snippet:

lst name=7232

arr name=description

str

EXAMPLE prepares an extensive range of traceable calibration
emgas/em standards with guaranteed relative uncertainties levels of
em0.1/em% for certain species (PDF 676 KB).

/str

/arr

/lst

 

Note how the highlight snippet correctly highlights gas and
incorrectly highlights 0.1. I've observed similar results for other
searches where indexed content contains 0, 0.1, 0.2 and 0.4 and
where these numbers are highlighted incorrectly.

 

At this stage I'm trying to determine if this due to a poor
implementation on my behalf or whether this is a bug in Solr.

 

I'd really like to know if:

 

1.   Anyone else has observed this behaviour

2.   If this might be a known issue with Solr (I've tried to find
out but haven't had any luck)

3.   Anyone can test using something like
http://solr/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in+resp
onse)hl.fragsize=0
http://%3csolr%3e/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in
+response)hl.fragsize=0 

 

Thanks,

Jon Cram

Re: clustering component

2010-07-28 Thread Stanislaw Osinski

 The patch should also work with trunk, but I haven't verified it yet.


I've just added a patch against solr trunk to
https://issues.apache.org/jira/browse/SOLR-1804.

S.

Show elevated Result Differently

2010-07-28 Thread Vishal.Arora


I want to show elevated Result Different from others is there any way to do
this 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan

I think you should just be able to add wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

SolrJ Response + JSON


Hello , 

Second try to send a mail to the mailing list... 

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you. 
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002115p1002115.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ Response + JSON

2010-07-28 Thread Mark Allan



On 28 Jul 2010, at 2:08 pm, MitchK wrote:

Second try to send a mail to the mailing list...


Your first attempt got through as well.  Here's my original response.


I think you should just be able to add wt=json to the end of your  
query (or change whatever the existing wt parameter is in your URL).


Mark

On 28 Jul 2010, at 12:54 pm, MitchK wrote:



Hello community,

I need to transform SolrJ - responses into JSON, after some  
computing on

those results by another application has finished.

I can not do those computations on the Solr - side.

So, I really have to translate SolrJ's output into JSON.

Any experiences how to do so without writing your own JSON-writer?

Thank you.
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: SolrJ Response + JSON

2010-07-28 Thread Markus Jelsma

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the 
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get 
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
 Hello ,
 
 Second try to send a mail to the mailing list...
 
 I need to translate SolrJ's response into JSON-response.
 I can not query Solr directly, because I need to do some math with the
 responsed data, before I show the results to the client.
 
 Any experiences how to translate SolrJ's response into JSON without writing
 your own JSON Writer?
 
 Thank you.
 - Mitch
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: SolrJ Response + JSON


Thank you Markus, Mark.

Seems to be a problem with Nabble, not with the mailing list. Sorry.

I can create a JSON-response, when I query Solr directly.
But I mean, that I query Solr through a SolrJ-client 
(CommonsHttpSolrServer).
That means my queries look a litte bit like that: 
http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr

So the response is shown as an QueryResponse-object, not as a JSON-string.

Or do I miss something here?

Am 28.07.2010 15:15, schrieb Markus Jelsma:

Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

Yesterday I did get this working with version 4.0 from trunk.  I haven't fully 
tested it yet, but the content doesn't come through blank anymore, so that's 
good.  Would it be more stable to stick with 1.4.1 and your patch to get to 
Tika 0.8, or to stick with the 4.0 trunk version?

Best,
Dave

-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Wednesday, July 28, 2010 3:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

I attached a patch for Solr 1.4.1 release on
https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
me.
This strange behaviour for me was due to the fact that I copied the patched
jars and war inside the dist directory but forgot to update the war inside
the example/webapps directory (that is inside Jetty).
Hope this helps.
Tommaso

2010/7/27 David Thibault dthiba...@esperion.com

 Alessandro  all,

 I was having the same issue with Tika crashing on certain PDFs.  I also
 noticed the bug where no content was extracted after upgrading Tika.

 When I went to the SOLR issue you link to below, I applied all the patches,
 downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and
 got the following error:
 SEVERE: java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
 at
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
 at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
 at java.lang.Thread.run(Thread.java:619)

 This is really weird because I DID apply the SolrResourceLoader patch that
 adds the getClassLoader method.  I even verified by going opening up the
 JARs and looking at the class file in Eclipse...I can see the
 SolrResourceLoader.getClassLoader() method.

 Does anyone know why it can't find the method?  After patching the source I
 did ant clean dist in the base directory of the Solr source tree and
 everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
 the jars from dist/ and all the library dependencies from
 contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in
 the logs looked good.

 I'm stumped.  It would be very nice to have a Solr implementation using the
 newest versions of PDFBox  Tika and actually have content being
 extracted...=)

 Best,
 Dave


 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Tuesday, July 27, 2010 6:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 Hi Jon,
 During the last days we front the same problem.
 Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
 content and from others, Solr throws an exception during the Indexing
 Process .
 You must:
 Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
 snapshot and tika-parsers 0.8.
 Update PdfBox and all related libraries.
 After that You have to patch Solr 1.4.1 following this patch :

 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
 This is the firts way to solve the problem.

 Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception
 is
 thrown during the Indexing process, but no content is extracted.
 Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)  all
 sounds good but we don't know how stableit is!
 I hope you have now a clear  vision of this issue,

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Alessandro Benedetti

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault dthiba...@esperion.com

 Yesterday I did get this working with version 4.0 from trunk.  I haven't
 fully tested it yet, but the content doesn't come through blank anymore, so
 that's good.  Would it be more stable to stick with 1.4.1 and your patch to
 get to Tika 0.8, or to stick with the 4.0 trunk version?

 Best,
 Dave

 -Original Message-
 From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 Sent: Wednesday, July 28, 2010 3:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 I attached a patch for Solr 1.4.1 release on
 https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
 me.
 This strange behaviour for me was due to the fact that I copied the patched
 jars and war inside the dist directory but forgot to update the war inside
 the example/webapps directory (that is inside Jetty).
 Hope this helps.
 Tommaso

 2010/7/27 David Thibault dthiba...@esperion.com

  Alessandro  all,
 
  I was having the same issue with Tika crashing on certain PDFs.  I also
  noticed the bug where no content was extracted after upgrading Tika.
 
  When I went to the SOLR issue you link to below, I applied all the
 patches,
  downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
 and
  got the following error:
  SEVERE: java.lang.NoSuchMethodError:
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
  at
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
  at
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
  at
 org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
  at java.lang.Thread.run(Thread.java:619)
 
  This is really weird because I DID apply the SolrResourceLoader patch
 that
  adds the getClassLoader method.  I even verified by going opening up the
  JARs and looking at the class file in Eclipse...I can see the
  SolrResourceLoader.getClassLoader() method.
 
  Does anyone know why it can't find the method?  After patching the source
 I
  did ant clean dist in the base directory of the Solr source tree and
  everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
  the jars from dist/ and all the library dependencies from
  contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
 in
  the logs looked good.
 
  I'm stumped.  It would be very nice to have a Solr implementation using
 the
  newest versions of PDFBox  Tika and actually have content being
  extracted...=)
 
  Best,
  Dave
 
 
  -Original Message-
  From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
  Sent: Tuesday, July 27, 2010 6:09 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  Hi Jon,
  During the last days we front the same problem.
  Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
  content and from others, Solr throws an exception during the Indexing
  Process .
  You must:
  Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
  snapshot and tika-parsers 0.8.
  Update PdfBox and all related libraries.
  After that You have to patch Solr 1.4.1 following this patch :
 
 
 https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
  This is the firts way to solve the problem.
 
  Using Solr 1.4.1 (with tika 0.8 snapshot and

Re: SolrJ Response + JSON

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
 Hello , 
 
 Second try to send a mail to the mailing list... 
 
 I need to translate SolrJ's response into JSON-response.
 I can not query Solr directly, because I need to do some math with the
 responsed data, before I show the results to the client.
 
 Any experiences how to translate SolrJ's response into JSON without writing
 your own JSON Writer?
 
 Thank you. 
 - Mitch

RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content

If you don't store the content then you can't do highlighting, right?  Also, 
don't you just have to switch the text field to say stored=true in your 
schema to store the text?  I don't understand why you're differentiating the 
behavior of ExtractingRequestHandler from the behavior of Solr in general.  
Doesn't ExtractingRequestHandler just pull the text out of whatever file you 
send it and then the rest of the processing happens like any other Solr post?

The bug I was experiencing was the same one that someone else brought up on the 
list yesterday in the emails entitled Extracting PDF 
text/comment/callout/typewriter boxes with Solr   CELL/Tika/PDFBox.  It ties 
back to this bug:
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel

I saw that email shortly after I sent this one to the list (it figures, doesn't 
it...=).

I tried doing what they suggested on that bug report (patching Solr 1.4.x and 
using Tika 0.8-SNAPSHOT), but the patches failed when I applied it to my Solr 
1.4.1.  They have since added a patch for Solr 1.4.1.  I haven't tried it yet.  
However, I did get it working using Solr 4.0 out of trunk (which also uses Tika 
0.8 and updated PDFBox jars).  I have yet to decide which will be more stable, 
Solr 4.0 or patched Solr 1.4.1, both of which with updated PDFbox and Tika jars.

Best,
Dave

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Tuesday, July 27, 2010 8:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).

It should be indexed- that means if you search 'text' with a word from
the document, it will find those documents and bring back the file
name. Your app has to then use the file name.  Solr/Lucene is not
intended as a general-purpose content store, only an index.

The ERH wiki page doesn't quite say this. It describes what the ERH
does rather than what it does not do :)

On Mon, Jul 26, 2010 at 12:00 PM, David Thibault dthiba...@esperion.com wrote:
 Hello all,

 I’m working on a project with Solr.  I had 1.4.1 working OK using 
 ExtractingRequestHandler except that it was crashing on some PDFs.  I noticed 
 that Tika bundled with 1.4.1 was 0.4, which was kind of old.  I decided to 
 try updating to 0.7 as per the directions here: 
 http://wiki.apache.org/solr/ExtractingRequestHandler  but it was giving me 
 errors (I forget what they were specifically).

 Then I tried downloading Solr 3.1 from the source repository, which I noticed 
 came with Tika 0.7.  I figured this would be an easier route to get working.  
 Now I’m testing with 3.1 and 0.7 and I’m noticing my documents are going into 
 Solr OK, but they all have blank content (no document text stored in Solr).  
 I did see that the default “text” field is not stored. Changing that to 
 stored=true didn’t help.  Changing to 
 fmap.content=attr_contentuprefix=attr_content didn’t help either.  I have 
 attached all relevant info here.  Please let me know if someone sees 
 something I don’t (it’s entirely possible as I’m relatively new to Solr).

 Schema.xml:
 ?xml version=1.0 encoding=UTF-8 ?
 schema name=example version=1.3
  types
fieldType name=string class=solr.StrField sortMissingLast=true 
 omitNorms=true/
fieldType name=boolean class=solr.BoolField sortMissingLast=true 
 omitNorms=true/
fieldtype name=binary class=solr.BinaryField/
fieldType name=int class=solr.TrieIntField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
fieldType name=tint class=solr.TrieIntField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 
 omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField omitNorms=true 
 precisionStep=0 positionIncrementGap=0/
fieldType name=tdate class=solr.TrieDateField omitNorms=true 
 precisionStep=6 positionIncrementGap=0/
fieldType name=pint class=solr.IntField omitNorms=true/
fieldType name=plong class=solr.LongField omitNorms=true/
fieldType name=pfloat class=solr.FloatField

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)

Thanks, I'll try that then. I kind of figured that'd be the answer, but after 
fighting with Solr  ExtractingRequestHandler for 2 days I also just wanted to 
be done with it once it started working with 4.0...=)  However, stability would 
be better in the long run.

Best,
Dave

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Wednesday, July 28, 2010 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr 
CELL/Tika/PDFBox

In my opinion, the 1.4.1 version with the Patch is more Stable.
Until 4.0 will be released 

2010/7/28 David Thibault dthiba...@esperion.com

 Yesterday I did get this working with version 4.0 from trunk.  I haven't
 fully tested it yet, but the content doesn't come through blank anymore, so
 that's good.  Would it be more stable to stick with 1.4.1 and your patch to
 get to Tika 0.8, or to stick with the 4.0 trunk version?

 Best,
 Dave

 -Original Message-
 From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 Sent: Wednesday, July 28, 2010 3:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 I attached a patch for Solr 1.4.1 release on
 https://issues.apache.org/jira/browse/SOLR-1902 that made things work for
 me.
 This strange behaviour for me was due to the fact that I copied the patched
 jars and war inside the dist directory but forgot to update the war inside
 the example/webapps directory (that is inside Jetty).
 Hope this helps.
 Tommaso

 2010/7/27 David Thibault dthiba...@esperion.com

  Alessandro  all,
 
  I was having the same issue with Tika crashing on certain PDFs.  I also
  noticed the bug where no content was extracted after upgrading Tika.
 
  When I went to the SOLR issue you link to below, I applied all the
 patches,
  downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
 and
  got the following error:
  SEVERE: java.lang.NoSuchMethodError:
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
  at
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
  at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
  at
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
  at
 org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
  at java.lang.Thread.run(Thread.java:619)
 
  This is really weird because I DID apply the SolrResourceLoader patch
 that
  adds the getClassLoader method.  I even verified by going opening up the
  JARs and looking at the class file in Eclipse...I can see the
  SolrResourceLoader.getClassLoader() method.
 
  Does anyone know why it can't find the method?  After patching the source
 I
  did ant clean dist in the base directory of the Solr source tree and
  everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied all
  the jars from dist/ and all the library dependencies from
  contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything
 in
  the logs looked good.
 
  I'm stumped.  It would be very nice to have a Solr implementation using
 the
  newest versions of PDFBox  Tika and actually have content being
  extracted...=)
 
  Best,
  Dave
 
 
  -Original Message-
  From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
  Sent: Tuesday, July 27, 2010 6:09 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  Hi Jon,
  During the last days we front the same problem.
  Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't

Re: logic required for newbie

2010-07-28 Thread rajini maski

you can index each of these field separately...
field1- Id
field2- name
field3-user_id
field4-country.


field7- landmark

While quering  you can specify  q=Landmark9 This will return you results..
And if you want only particular fields in output.. use the fl parameter in
query...

like

http://localhost:8090/solr/select?
indent=onq=landmark9fl=ID,user_id,country,landmark

This will give your desired solution..




On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi All,

 I am very new and learning solr.

 I have 10 column like following in table

 1. id
 2. name
 3. user_id
 4. location
 5. country
 6. landmark1
 7. landmark2
 8. landmark3
 9. landmark4
 10. landmark5

 when user search for landmark then  I want to return only one landmark
 which
 match. Rest of the landmark should ingnored..
 expected result like following if user search by landmark2..

 1. id
 2. name
 3. user_id
 4. location
 5. country
 7. landmark2

 or if search by landmark9

 1. id
 2. name
 3. user_id
 4. location
 5. country
 9. landmark9


 please help me to design the schema for this kind of requirement...

 thanks
 with regards

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili

This was my same feeling :-) and so I went for the trunk to have things
working quickly, but I also have to consider which one is the best version
since I am going to deploy it in the near future in an enterprise
environment and choosing the best version is an importat step.
I am quite new to Solr but I agree with Alessandro that probably using a
slightly patched release should theoretically be more stable than the trunk
which get many updates weekly (and daily).
Cheers,
Tommaso

2010/7/28 David Thibault dthiba...@esperion.com

 Thanks, I'll try that then. I kind of figured that'd be the answer, but
 after fighting with Solr  ExtractingRequestHandler for 2 days I also just
 wanted to be done with it once it started working with 4.0...=)  However,
 stability would be better in the long run.

 Best,
 Dave

 -Original Message-
 From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
 Sent: Wednesday, July 28, 2010 9:33 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr
 CELL/Tika/PDFBox

 In my opinion, the 1.4.1 version with the Patch is more Stable.
 Until 4.0 will be released 

 2010/7/28 David Thibault dthiba...@esperion.com

  Yesterday I did get this working with version 4.0 from trunk.  I haven't
  fully tested it yet, but the content doesn't come through blank anymore,
 so
  that's good.  Would it be more stable to stick with 1.4.1 and your patch
 to
  get to Tika 0.8, or to stick with the 4.0 trunk version?
 
  Best,
  Dave
 
  -Original Message-
  From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
  Sent: Wednesday, July 28, 2010 3:31 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with
 Solr
  CELL/Tika/PDFBox
 
  I attached a patch for Solr 1.4.1 release on
  https://issues.apache.org/jira/browse/SOLR-1902 that made things work
 for
  me.
  This strange behaviour for me was due to the fact that I copied the
 patched
  jars and war inside the dist directory but forgot to update the war
 inside
  the example/webapps directory (that is inside Jetty).
  Hope this helps.
  Tommaso
 
  2010/7/27 David Thibault dthiba...@esperion.com
 
   Alessandro  all,
  
   I was having the same issue with Tika crashing on certain PDFs.  I also
   noticed the bug where no content was extracted after upgrading Tika.
  
   When I went to the SOLR issue you link to below, I applied all the
  patches,
   downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl,
  and
   got the following error:
   SEVERE: java.lang.NoSuchMethodError:
  
 
 org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
   at
  
 
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93)
   at
  
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
   at
  
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at
  
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at
  
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
  
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   at
  
 
 org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
   at
  
 
 org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
   at
  org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
   at java.lang.Thread.run(Thread.java:619)
  
   This is really weird because I DID apply the SolrResourceLoader patch
  that
   adds the getClassLoader method.  I even verified by going opening up
 the
   JARs and looking at the class file in Eclipse...I can see the
   SolrResourceLoader.getClassLoader() method.
  
   Does anyone know why it can't find the method?  After patching the
 source
  I
   did ant clean dist in the base directory of the Solr source tree and
   everything looked like it compiles (BUILD SUCCESSFUL).  Then I copied
 all
   the jars from dist/ and all the library dependencies from

simple question from a newbie

Hi,

 

I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example

 

I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

 

I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention

 

I want something like dc.title=c* type query

 

I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]

 

 

Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329

Re: SolrJ Response + JSON


Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field 
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration. 
However, the client shouldn't do that.

How did you solved that problem?

Thanks for sharing ideas.

- Mitch


Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
   

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

Re: SolrJ Response + JSON

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:
 Thank you, Chantal.
 
 I have looked at this one: http://www.json.org/java/index.html
 
 This seems to be an easy-to-understand-implementation.
 
 However, I am wondering how to determine whether a SolrDocument's field 
 is multiValued or not.
 The JSONResponseWriter of Solr looks at the schema-configuration. 
 However, the client shouldn't do that.
 How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal


 
 Thanks for sharing ideas.
 
 - Mitch
 
 
 Am 28.07.2010 15:35, schrieb Chantal Ackermann:
  You could use org.apache.solr.handler.JsonLoader.
  That one uses org.apache.noggit.JSONParser internally.
  I've used the JacksonParser with Spring.
 
  http://json.org/ lists parsers for different programming languages.
 
  Cheers,
  Chantal
 
  On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:
 
  Hello ,
 
  Second try to send a mail to the mailing list...
 
  I need to translate SolrJ's response into JSON-response.
  I can not query Solr directly, because I need to do some math with the
  responsed data, before I show the results to the client.
 
  Any experiences how to translate SolrJ's response into JSON without writing
  your own JSON Writer?
 
  Thank you.
  - Mitch

Re: logic required for newbie

2010-07-28 Thread Jonty Rhods

Hi

thanks for reply..
 Actually requirement is diffrent (sorry if I am unable to clerify in first
mail).

basically follwoing are the fields name in schema as well:
 1. id
 2. name
 3. user_id
 4. location
 5. country
 6. landmark1
 7. landmark2
 8. landmark3
 9. landmark4
 10. landmark5

which carrying text...
for example:

id1/id
namesome name/name
user_iduser_id/user_id
locationnew york/location
countryUSA/country
landmark15th avenue/landmark1
landmark2ms departmental store/landmark2
landmark3base bakery/landmark3
landmark4piza hut/landmark4
landmark5ford motor/landmark5

now if user search by piza then expected result like:

id1/id
namesome name/name
user_iduser_id/user_id
locationnew york/location
countryUSA/country
landmark4piza hut/landmark4

it means I want to ignore all other landmark which not match. By filter we
can filter the fields but here I dont know the
the field name because it depends on text match.

is there any other solution.. I am ready to change in schema or in logic. I
am using solrj.

please help me I stuck here..

with regards


On Wed, Jul 28, 2010 at 7:22 PM, rajini maski rajinima...@gmail.com wrote:

 you can index each of these field separately...
 field1- Id
 field2- name
 field3-user_id
 field4-country.

 
 field7- landmark

 While quering  you can specify  q=Landmark9 This will return you
 results..
 And if you want only particular fields in output.. use the fl parameter
 in
 query...

 like

 http://localhost:8090/solr/select?
 indent=onq=landmark9fl=ID,user_id,country,landmark

 This will give your desired solution..




 On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com
 wrote:

  Hi All,
 
  I am very new and learning solr.
 
  I have 10 column like following in table
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  6. landmark1
  7. landmark2
  8. landmark3
  9. landmark4
  10. landmark5
 
  when user search for landmark then  I want to return only one landmark
  which
  match. Rest of the landmark should ingnored..
  expected result like following if user search by landmark2..
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  7. landmark2
 
  or if search by landmark9
 
  1. id
  2. name
  3. user_id
  4. location
  5. country
  9. landmark9
 
 
  please help me to design the schema for this kind of requirement...
 
  thanks
  with regards

Re: display solr result in JSP

2010-07-28 Thread Ranveer


Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).

hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui

Re: simple question from a newbie

2010-07-28 Thread Ranveer

I think you using wild-card search or should use wild-card search. but 
first of all please provide the schema and configuration file for more 
details.


regards
Ranveer


On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) 
(CTR) wrote:

Hi,



I'm new to Solr and have a rather dumb question.  I want to do a query
that returns all the Titles that start with a certain letter.  For
example



I have these titles:

Results of in-mine research in support

Cancer Reports

State injury indicators report

Cancer Reports

Indexed dermal bibliography

Childhood agricultural-related injury report

Childhood agricultural injury prevention





I want the query to return:

Cancer Reports

Cancer Reports

Childhood agricultural-related injury report

Childhood agricultural injury prevention



I want something like dc.title=c* type query



I know that I can facet by dc.title and then use the parameter
facet.prefix=c but it returns something like this:

Cancer Reports [2]

Childhood agricultural-related injury report [1]

Childhood agricultural injury prevention [1]





Vincent Vu Nguyen
Division of Science Quality and Translation

Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329

RE: Indexing Problem: Where's my data?

2010-07-28 Thread Michael Griffiths

Thanks - but my schema.xml is not recognizing field names specified in the 
data-config.xml.

For example - and I just tested this now - if I have in my data-config.xml:

field column=product_id name=pid /

And then in my schema.xml:

field name=pid type=int indexed=true stored=true required=true /

Then no documents are processed (e.g. I get rows queried, but str name=Total 
Documents Processed0/str in the data handler UI).

But if I change that to:

field name=product_id type=int indexed=true stored=true 
required=true /

... now documents are processed (e.g. str name=Total Documents 
Processed313/str).

Which, quite frankly, confuses me. I may be doing something else wrong (I 
changed my SQL as well, so I'm getting another failure, but I think it's 
separate to this one).

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, July 27, 2010 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing Problem: Where's my data?

Solr respects case for field names.  Database fields are supplied in 
lower-case, so it should be 'attribute_name' and 'string_value'. Also 
'product_id', etc.

It is easier if you carefully emulate every detail in the examples, for example 
lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc ken.fos...@realestate.com wrote:

 for STRING_VALUE, I assume there is a property in the 'select *' 
 results called string_value? if so I'm not sure why it wouldn't work. 
 If not, then that's why, it doesn't have anything to put there.

 For ATTRIBUTE_NAME, is it possibly a case issue? you called it 
 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your 
 schema...just something to check I guess.

 Also, not sure why you are using name= in your fields, for example, 
 field column=PARENT_FAMILY name=Parent Family / I thought 
 'column' was the source field name and 'name' was supposed to be the 
 schema field name and if not there it would assume 'column' name. You 
 don't have a schema field called Parent Family so it looks like it's 
 defaulting to column name too which is lucky for you I suppose. But 
 you may want to either remove 'name=' or make it match the schema. 
 (and I may be completely wrong on this, it's been a while since I got DIH 
 going).


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp
 1000660p1000843.html Sent from the Solr - User mailing list archive at 
 Nabble.com.




--
Lance Norskog
goks...@gmail.com

Re: Spellchecking and frequency

2010-07-28 Thread Jonathan Rochkind




I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.


This is interesting to me. I also have not been that happy with standard 
solr spellcheck. 

In addition to possibly filing a JIRA for future fix to Solr itself, 
another option would be you could make your 'alternate' SpellCheck 
component available as a seperate .jar, so anyone could use it just by 
installing and specifying it in their solrconfig.xml.  I would encourage 
you to consider that, not as a replacement for suggesting a patch to 
Solr itself, but so people can use your improved spellchecker 
immediately, without waiting for possible Solr patches.


Jonathan

Re: Is there a cache for a query?

2010-07-28 Thread Moazzam Khan

As far as I know all searches get cache at least for some time. I am
not sure about field collapse results being cached.

- Moazzam
http://moazzam-khan.com



On Mon, Jul 26, 2010 at 9:48 PM, Li Li fancye...@gmail.com wrote:
 I want a cache to cache all result of a query(all steps including
 collapse, highlight and facet).  I read
 http://wiki.apache.org/solr/SolrCaching, but can't find a global
 cache. Maybe I can use external cache to store key-value. Is there any
 one in solr?

Re: SolrJ Response + JSON

2010-07-28 Thread Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR)

Hi Chantal,

thank you for the feedback.
I did not see the wood for the trees!
The SolrDocument's javadoc says the following:
http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html

|*getFieldValue
../../../../org/apache/solr/common/SolrDocument.html#getFieldValue%28java.lang.String%29*(String
http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true name)|

Get the value or collection of values for a given field.

The magical word here is that little or :-).

I will try that tomorrow and give you a feedback!

Are you sure that you cannot change the SOLR results at query time
according to your needs?

Unfortunately, it is not possible in this case.

Kind regards,
Mitch

Am 28.07.2010 16:49, schrieb Chantal Ackermann:

Hi Mitch

On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote:

Thank you, Chantal.

I have looked at this one: http://www.json.org/java/index.html

This seems to be an easy-to-understand-implementation.

However, I am wondering how to determine whether a SolrDocument's field
is multiValued or not.
The JSONResponseWriter of Solr looks at the schema-configuration.
However, the client shouldn't do that.
How did you solved that problem?

I didn't. I'm not recreating JSON from the SolrJ results.

I would try to use the same classes that SolrJ uses, actually. (Writing
that without having a further look at the code.) I would avoid
recreating existing code as much as possible.
About multivalued fields: you need instanceof checks, I guess. The field
only contains a list if there really are multiple values. (That's what
works for my ScriptTransformer.)

Are you sure that you cannot change the SOLR results at query time
according to your needs? Maybe you should ask for that, first (ask for X
instead of Y...).

Cheers,
Chantal

Thanks for sharing ideas.

- Mitch

Am 28.07.2010 15:35, schrieb Chantal Ackermann:

You could use org.apache.solr.handler.JsonLoader.
That one uses org.apache.noggit.JSONParser internally.
I've used the JacksonParser with Spring.

http://json.org/ lists parsers for different programming languages.

Cheers,
Chantal

On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote:

Hello ,

Second try to send a mail to the mailing list...

I need to translate SolrJ's response into JSON-response.
I can not query Solr directly, because I need to do some math with the
responsed data, before I show the results to the client.

Any experiences how to translate SolrJ's response into JSON without writing
your own JSON Writer?

Thank you.
- Mitch

RE: simple question from a newbie

I think I got it to work.  If I do a wildcard search using the dc3.title
field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
every title that has a word in it that starts with 'c', which isn't
exactly what I wanted.  I'm guessing it's because of the
type=caseInsensitiveSort.  

Well, here is my schema for reference.  Thanks for your help.


- schema name=example version=1.1
- types
  fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true / 
- !--  boolean type: true or false 
  -- 
  fieldType name=boolean class=solr.BoolField
sortMissingLast=true omitNorms=true / 
  fieldType name=integer class=solr.IntField omitNorms=true / 
  fieldType name=long class=solr.LongField omitNorms=true / 
  fieldType name=float class=solr.FloatField omitNorms=true / 
  fieldType name=double class=solr.DoubleField omitNorms=true / 
  fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true / 
  fieldType name=slong class=solr.SortableLongField
sortMissingLast=true omitNorms=true / 
  fieldType name=sfloat class=solr.SortableFloatField
sortMissingLast=true omitNorms=true / 
  fieldType name=sdouble class=solr.SortableDoubleField
sortMissingLast=true omitNorms=true / 
  fieldType name=date class=solr.DateField sortMissingLast=true
omitNorms=true / 
- fieldType name=text_ws class=solr.TextField
positionIncrementGap=100
- analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  /analyzer
  /fieldType
- fieldType name=text class=solr.TextField
positionIncrementGap=100
- analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
- analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
  /fieldType
- fieldType name=textTight class=solr.TextField
positionIncrementGap=100
- analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false / 
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt / 
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
generateNumberParts=0 catenateWords=1 catenateNumbers=1
catenateAll=0 / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt / 
  filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
  /fieldType
- fieldType name=caseInsensitiveSort class=solr.TextField
sortMissingLast=true omitNorms=true
- analyzer
  tokenizer class=solr.KeywordTokenizerFactory / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.TrimFilterFactory / 
  /analyzer
  /fieldType
  fieldtype name=ignored stored=false indexed=false
class=solr.StrField / 
  /types
- fields
- !--  Fedora specific fields 
  -- 
  field name=PID type=string indexed=true stored=true / 
  field name=fgs.state type=string indexed=true stored=true / 
  field name=fgs.label type=text indexed=true stored=true / 
  field name=fgs.ownerId type=string indexed=true stored=true
/ 
  field name=fgs.createdDate type=date indexed=true stored=true
/ 
  field name=fgs.lastModifiedDate type=date indexed=true
stored=true / 
  field name=fgs.contentModel type=string indexed=true
stored=true / 
  field name=fgs.type type=string indexed=true stored=true
multiValued=true / 
- !--  DC Fields 
  -- 
  field name=dc.contributor type=text indexed=true stored=true
multiValued=true / 
  field name=dc.coverage type=text indexed=true stored=true
multiValued=true / 
  field name=dc.creator type=text indexed=true stored=true
multiValued=true / 
  field name=dc.date type=text indexed=true stored=true
multiValued=true / 
  field name=dc.description type=text indexed=true stored=true
multiValued=true / 
  field name=dc.format type=text indexed=true stored=true
multiValued=true / 
  field name=dc.identifier type=text indexed=true stored=true
multiValued=true / 
  field name=dc.language type=text indexed=true stored=true
multiValued=true / 
  field name=dc.publisher type=text indexed=true stored=true
multiValued=true / 
  field name=dc.relation type=text indexed=true

Solr 1.4.1 field collapse

2010-07-28 Thread Moazzam Khan

Hi guys,

I read somewhere that Solr 1.4.1 has field collapse support by default
(without patching it) but I haven't been able to confirm it. Is this
true?

- Moazzam

Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali


Well I do have disk limitations too, and thats why I think slave nodes died,
when replicating data from master node. (as it was just adding on top of
existing index files).

:: What do you mean here? Optimizing is too CPU expensive? 

What I meant by avoid playing around with slave nodes is that doing anything
(including optimization on slave nodes) that may effect the live search
performance, unless I have no option.

:: Do you mean increase to double size? 

yes, as it did before on replication. But I didn't get a chance to run the
indexer yesterday. 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: slave index is bigger than master index

2010-07-28 Thread Muneeb Ali


 In solrconfig.xml, these two lines control that. Maybe they need to be
increased.
 str name=httpConnTimeout5000/str
 str name=httpReadTimeout1/str 

Where do I add those in solrconfig? These lines doesn't seem to be present
in the example solrconfig file...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002432.html
Sent from the Solr - User mailing list archive at Nabble.com.

How do NOT queries work?

2010-07-28 Thread Kaan Meralan

I wonder how do NOT queries work. Is it a pass on the result set and
filtering out the NOT property or something like that?

Also is there anybody who does some performance checks on NOT queries? I
want to know whether there is a significant performance degradation or not
when you have NOT in a query.

Thanks...

//kaan

RE: display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thanks so much for your reply. I don't have much experience at JSP. I found tag 
library, and am trying to use  xsltlib:apply xml=%= 
url.getContent().toString() % xsl=/xsl/result.xsl/ . Unfortunately I 
didn't get it work. 

Would you please give me more information? I really appreciate your help!

Thanks,
Xiaohui 

-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: display solr result in JSP

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).
hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 I am new for solr. Just got example xml file index and search by following 
 solr tutorial. I wonder how I can get the search result display in a JSP. I 
 really appreciate any suggestions you can give.

 Thanks so much,
 Xiaohui

Re: Total number of terms in an index?

2010-07-28 Thread Jason Rutherglen

Tom,

The total number of terms... Ah well, not a big deal, however yes the
flex branch does expose this so we can show this in Solr at some
point, hopefully outside of Solr's Luke impl.

On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom tburt...@umich.edu wrote:
 Hi Jason,

 Are you looking for the total number of unique terms or total number of term 
 occurrences?

 Checkindex reports both, but does a bunch of other work so is probably not 
 the fastest.

 If you are looking for total number of term occurrences, you might look at 
 contrib/org/apache/lucene/misc/HighFreqTerms.java.

 If you are just looking for the total number of unique terms, I wonder if 
 there is some low level API that would allow you to just access the in-memory 
 representation of the tii file and then multiply the number of terms in it by 
 your indexDivisor (default 128). I haven't dug in to the code so I don't 
 actually know how the tii file gets loaded into a data structure in memory.  
 If there is api access, it seems like this might be the quickest way to get 
 the number of unique terms.  (Of course you would have to do this for each 
 segment).

 Tom
 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Monday, July 26, 2010 8:39 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Total number of terms in an index?


 : Sorry, like the subject, I mean the total number of terms.

 it's not stored anywhere, so the only way to fetch it is to actually
 iteate all of the terms and count them (that's why LukeRequestHandler is
 slow slow to compute this particular value)

 If i remember right, someone mentioned at one point that flex would let
 you store data about stuff like this in your index as part of the segment
 writing, but frankly i'm still not sure how that iwll help -- because you
 unless your index is fully optimized, you still have to iterate the terms
 in each segment to 'de-dup' them.


 -Hoss

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox