date:20120531

Re: A few random questions about solr queries.

2012-05-31 Thread santamaria2

A wee bit of clarification on the 2nd question. I meant relative performance,
ie. would it be much slower to facet over 20 facet.queries  10 facet.fields
compared to say, 4 facet.queries  facet.fields. I wonder if this makes
sense...

So... is a bump improper etiquette here? _

--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562p3986977.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 1.3 Multicores and maxboolean clause

2012-05-31 Thread Sujatha Arun

Thanks Jack .

In which case the template cores would be ones that would be initialized
first and we need to take care of this on template configs .

Also I notices that when we remove the core1 and core0 and try to create a
new webapp without any core and empty solr.xml and try to create a new core
,we get an error and core is not created.

Regards
Sujatha

On Thu, May 31, 2012 at 12:40 AM, Jack Krupansky j...@basetechnology.comwrote:

As per the source code, Solr only sets the BooleanQuery clause limit on
the very first core load. It ignores any the setting on subsequent core
loads, including a reload of the initial core.

SolrCore.java: // only change the BooleanQuery maxClauseCount once for
ALL cores...

The cores should get loaded in the order they appear in solr.xml, although
I don't know if that is a written, contractual guarantee.

As the CoreAdmin wiki page says, Workaround, set maxBooleanClauses to the
greatest value desired in *all* cores.

See:
http://wiki.apache.org/solr/**CoreAdmin#Known_Issueshttp://wiki.apache.org/solr/CoreAdmin#Known_Issues

The wiki is wrong when it says Whichever Solr core initializes last will
win the setting of the solrconfig.xml's maxBooleanClauses value. The first
core to be loaded wins. Or, maybe the source code is wrong. Either way, a
correction is needed.

-- Jack Krupansky

-Original Message- From: Sujatha Arun
Sent: Wednesday, May 30, 2012 1:30 PM
To: solr-user@lucene.apache.org
Subject: solr 1.3 Multicores and maxboolean clause

Hello,

The solrcore Wiki says that Lucene's
BooleanQueryhttp://wiki.**apache.org/solr/BooleanQueryhttp://wiki.apache.org/solr/BooleanQuery
**maxClauseCount

is a static variable, making it a single value across the
entire JVM. Whichever Solr core initializes last will win the setting of
the solrconfig.xml's maxBooleanClauses value. Workaround, set
maxBooleanClauses to the greatest value desired in *all* cores.

Now what I see is that even if any one core* has a smaller value for
maxboolean clause* ,the smaller one is taken into effect and not the last
core which is created.

*Some questions*

1. What is the order for initialization of the cores on a server

restart,I don't see this info in the logs?
2. When i change the maxboolean clause on one cores and reload the core

,it is not effected ?Does this require Tomcat restart?why?
3. The default cores core0 and core1 that comes in the example multicore

setup does not have this value set in them as it has minimum configuration
,does this affect the value in other cores if I use that as default?

Regards,
Sujatha

Re: Accent Characters

2012-05-31 Thread Sami Siren

Vicente,

Are you using CommonsHttpSolrServer or HttpSolrServer? If the latter
then you are probably hitting this:
https://issues.apache.org/jira/browse/SOLR-3375

The remedy is to use CommonshHttpSolrServer.

--
 Sami Siren

On Thu, May 31, 2012 at 7:52 AM, Vicente Couto couto.vice...@gmail.com wrote:
 Hello, Jack.

 Yeah, I'm screwed up.

 Well, the documents are indexed with the accents.
 I started a new clean solr 3.6 configuration, with as few changes as
 possible; I'm running two cores, one for English and another one for French.
 Here is where I am now: If I try to run queries by using solrJ, it does some
 sort of encoding. For example, I can see into the logs that if I run one
 query looking for pré, I got

 INFO: [coreFR] webapp=/solr path=/select
 params={fl=*,scoreq=content:prÃ©hl.fl=contenthl.maxAnalyzedChars=10hl=true}
 hits=0 status=0 QTime=0

 And I can't see any results. If I try by using encoding to UTF-8 it's not
 works.
 But if I simply put http calls into the browser address bar, for example, it
 works perfectly!
 So, how can I tell solrJ to not encode the queries?

 Thank you

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931p3986970.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to read fieldValueCacheStatistics

2012-05-31 Thread elisabeth benoit

ok, thanks a lot for the answer.

Elisabeth

2012/5/31 Chris Hostetter hossman_luc...@fucit.org


 : When I read fieldValueCache statistics I have something that looks like
 :
 : item_ABC_FACET :
 :
 {field=ABC_FACET,memSize=4224,tindexSize=32,time=92,phase1=92,nTerms=0,bigTerms=0,termInstances=0,uses=11}
 :
 :
 : is there a doc somewhere that explains what are

 ...technically that's one stat, showing you and UnInvertedField
 instance in the cache (that's the string-ification of that
 UnInvertedField)

 the specifics of what those numbers mean are definitely what i would
 consider expert level ... off the top of my head the only ones i am
 fairly sure of are:

 memSize - how many bytes of ram it's using
 time - how long it took to build
 nTerms - number of unique terms in that field
 bigTerms - number of big terms, ie: terms that have such a high docFreq,
 they weren't un-inverted because it would be too ineffectient.

 In general, this level of detail is the kind of thing where you should
 probably review the code.


 -Hoss

Re: Poll: What do you use for Solr performance monitoring?

2012-05-31 Thread Vadim Kisselmann

Hi Otis,
done :) Till now we use Graphite, Ganglia and Zabbix. For our JVM
monitoring JStatsD.
Best regards
Vadim


2012/5/31 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Hi,

 Super quick poll:  What do you use for Solr performance monitoring?
 Vote here: 
 http://blog.sematext.com/2012/05/30/poll-what-do-you-use-for-solr-performance-monitoring/


 I'm collecting data for my Berlin Buzzwords talk that will touch on Solr, so 
 your votes will be greatly appreciated!

 Thanks,
 Otis

Hightlighting and excerpt

2012-05-31 Thread Tolga


Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be 
able to highlight my keyword and display en excerpt containing that 
keyword. I found a solution with highlight, but what can I about excerpt?


Thanks and regards,

AW: Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework

2012-05-31 Thread Wunderlich, Tobias

Thanks for all the responses. I went with the UpdateRequestProcessor and it 
works.


-Ursprüngliche Nachricht-
Von: Lance Norskog [mailto:goks...@gmail.com] 
Gesendet: Samstag, 26. Mai 2012 01:53
An: solr-user@lucene.apache.org
Betreff: Re: Creating custom Filter / Tokenizer / Request Handler for 
integration of NER-Framework

Another problem (just discovered this): TokenizerFactories do not get resource 
handlers. So, you can't go read config or model files for your Tokenizer. 
TokenFilters do, so you can use the KeywordTokenizer (make one big term) and do 
your work in a TokenFilter that gets the whole thing.

On Thu, May 24, 2012 at 7:33 AM, Jan Høydahl jan@cominvent.com wrote:
 As Ahmet says, The Update Chain is probably the place to integrate such 
 document oriented processing.
 See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it 
 integrates with Solr.

 --
 Jan Høydahl, search solution architect Cominvent AS - 
 www.facebook.com/Cominvent Solr Training - www.solrtraining.com

 On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote:

 Hey Guys,

 I am recently working on a project to integrate a 
 Named-Entity-Recognition-Framework (NER) in an existing searchplatform based 
 on Solr. The Platform uses ManifoldCF to automatically gather the content 
 from various repositories. The NER-Framework creates Annotations/Metadata 
 from given content which I then want to integrate into the search-platform 
 as metadata to use for faceting. Since MCF handles all content gathering, I 
 need a way to integrate the NER-Framework directly into Solr. The Goal is to 
 get all Annotations per document into a multivalued field.  My first thought 
 was to create a custom filter, which just takes the content and gives back 
 only the Annotations.  But as I understand it, a filter only processes 
 predetermined Tokens, which is useless for my purpose, since the 
 NER-Framework needs to process the whole content of a document. What about a 
 custom Tokenizer? Would it be possible to process the whole text and give 
 back only the Annotations as Tokens? A third thought was to manipulate the 
 ExtractRequestHandler (Solr Cell) used by MCF to somehow add the Annotations 
 as Metadata when the content and metadata is distributed to the different 
 fields.

 I hope my problem description is sufficient. Does anybody have any thoughts 
 on that subject?

 Best regards,
 Tobias




--
Lance Norskog
goks...@gmail.com

Re: difference between Katta and SolrCloud (replicator factor)

2012-05-31 Thread Jamel ESSOUSSI

Hi,

responses please

-- Jamel E

--
View this message in context: 
http://lucene.472066.n3.nabble.com/difference-between-Katta-and-SolrCloud-replicator-factor-tp3986791p3986998.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Niran Fajemisin

Hi all,

I've seen a few questions asked around invoking stored procedures from within 
Data Import Handler but none of them seem to indicate what type of output 
parameters were being used.

I have a stored procedure created in Oracle database that takes a couple input 
parameters and has an output parameter that is a reference cursor. The cursor 
is expected to be used as a way of iterating through the returned table rows. 
I'm using the following format to invoke my stored procedure in the Data Import 
Handler's data config XML:

entity name=entity_name ... query={call my_stored_proc(inParam1, 
inParam2)} .../entity

I have tested that this query works prior to attempting to use it from within 
the DIH. But when I attempt to invoke this stored procedure, it naturally 
complains that the output parameter is not specified (essentially a mismatch in 
the number of parameters).

I don't know of anyway to pass in a cursor parameter (or any output parameter 
for that matter) to the stored procedure invocation from within the entity 
definition.  I would greatly appreciate if anyone could provide any pointers or 
hints on how to proceed.

Thanks so much for your time

Re: Query elevation / boosting or something else to guarantee document position

2012-05-31 Thread Michael Kuhlmann


Hi Wenca,

I'm a bit late. but maybe you're still interested.

There's no such functionality in standard Solr. With sorting, this is 
not possible, because sort functions only rank each single document, 
they know nothing about the position of the others. And query elevation 
is similar, you'll raise the score of independent documents.


To achive this, you'll need an own QueryComponent. This isn't too 
complicated. You can't change the SolrIndexSearcher easily, this does 
the search job. But you can subclass 
org.apache.solr.handler.component.QueryComponent and overwrite 
process(). Alas the single main line - searcher.search() - is buried 
deeply in the huge monster method process(), and you first have to check 
for shards, grouping and twentythousand other parameters until you've 
arrived the code line you may want to expand.


Before calling search(), set the GET_DOCSET flag in your QueryCommand 
object, then execute the search. To check whether there's a document of 
the particular manufacturer in the result list, you can either
a) fetch the appropriate field value from the default field cache for 
every single result document until you found one; or
b) call getDocSet() on the SolrIndexSearcher with the manufacturer query 
as the parameter, and perform and and() operation on the resulting 
DocSet with the DocSet of your main query. (That's why you set the flag 
before.) You can then check which document that matches both the 
manufacturer and the main query fits best.


If you found a matching document, but it's behind pos. 5 in the 
resulting DocList, the you simoply have to re-order your list.


If there's no such document within the DocList (which is limited by your 
rows parameter), but there are some in the joined DocSet from strategy 
b), then you can simply choose one of them and ignore the fact that this 
is probably not the best matching one. Or you have to patch Solr and 
modify getDocListNC() in solrIndexSearcher (or one of the Collector 
classes), which is much more complicated.


Good luck!
-Kuli

Am 29.05.2012 14:26, schrieb Wenca:

Hi all,

I have an index with thousands of products with various fields
(manufacturer, price, popularity, type, color, ...) and I want to
guarantee at least one product by a particular manufacturer to be within
the first 5 results.

The search is done mainly by using filter params and results are ordered
by function e.g.: product(price, popularity) asc or by discount desc

And I need to guarantee that if there is any product matching the given
filters made by a concrete manufacturer, then it will be on the 5th
position at worst, even if the position by the order function is worse.

It seems to me that the Query elevation component is not the right thing
for me. I don't know the query in advance (or the set of filter
criteria) and I don't know concrete product that will be the best for
the criteria within the order.

And also I don't think that I can construct a function with such
requirements to use it directly for ordering the results.

Of course I can make a second query in case there is no desired product
on the first page of results and put it there, but it requires
additional request to solr and complicates results processing and
further pagination.

Can anybody suggest any solution?

Thanks
Wenca

Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky

Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,

Re: Hightlighting and excerpt

2012-05-31 Thread Tolga

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB 
was stressed?


On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,

Efficiently mining or parsing data out of XML source files

2012-05-31 Thread Van Tassell, Kristian

I'm just wondering what the general consensus is on indexing XML data to Solr 
in terms of parsing and mining the relevant data out of the file and putting 
them into Solr fields. Assume that this is the XML file and resulting Solr 
fields:

XML data:
mydoc id=1234
titlefoo/title
bar attr1=val1/
bazgarbage data/baz
/ mydoc 

Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, 
however, is not acceptable. 2800 objects take 11 minutes to parse and index 
into Solr.

The big slowdown appears to be that I'm parsing the data with an XML parser.

So, now I'm testing mining the data by opening the file as just a text file 
(using Groovy) and picking out relevant data using regular expression matching. 
I'm now able to parse (mine) the data and index the 2800 files in 72 seconds.

So I'm wondering if the typical solution people use is to go with a non-XML 
solution. It seems to make sense considering the search index would only want 
to store (as much data) as possible and not rely on the incoming documents 
being xml compliant.

Thanks in advance for any thoughts on this!
-Kristian

Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky

Yes, that is what highlighting does - it extracts an excerpt and highlights 
search terms. You said you have highlighting working, so what else is it 
that you need?


Try /browse in the Solr example. It does exactly what your example shows. 
So, what else is it that you are trying to do? Or if something isn't 
working, what specifically isn't working?


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,

Re: Hightlighting and excerpt

2012-05-31 Thread Tolga

You mean http:///www.example.com:8983/solr/browse? It says unknown 
field 'cat'


On 5/31/12 4:16 PM, Jack Krupansky wrote:
Yes, that is what highlighting does - it extracts an excerpt and 
highlights search terms. You said you have highlighting working, so 
what else is it that you need?


Try /browse in the Solr example. It does exactly what your example 
shows. So, what else is it that you are trying to do? Or if something 
isn't working, what specifically isn't working?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you 
need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about 
excerpt?


Thanks and regards,

spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Markus Jelsma

Hi,

It seems it doesn't work or i cannot get it to work. I've tried both the 
IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
correctly spelled flag is correct when considering the fq parameters but the 
collation is never when using a filter. I've also tried 
spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
high) makes the collation element to disappear. Are there any (open) issues 
that i'm not aware of?

Thanks,
Markus

Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky


The Solr example. As in the Solr tutorial.

See:
http://lucene.apache.org/solr/api/doc-files/tutorial.html

Index books.json from exampledocs and then enter a /browse request in your 
web browser. Add the wt=xml query parameter so that you can see the raw 
XML response that shows the highlighting section rather than the 
VelocityWriter output.


Since you said that highlighting was working for you, please post an example 
of the highlighting section of a Solr response.


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

You mean http:///www.example.com:8983/solr/browse? It says unknown
field 'cat'

On 5/31/12 4:16 PM, Jack Krupansky wrote:
Yes, that is what highlighting does - it extracts an excerpt and 
highlights search terms. You said you have highlighting working, so what 
else is it that you need?


Try /browse in the Solr example. It does exactly what your example 
shows. So, what else is it that you are trying to do? Or if something 
isn't working, what specifically isn't working?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,

Re: Hightlighting and excerpt

2012-05-31 Thread Ahmet Arslan

 I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB 
 was stressed?

Hi Tolga, 

I think, you can easily learn the basic using one of the following books.
http://lucene.apache.org/solr/books.html

Re: XInclude Multiple Elements

2012-05-31 Thread Bogdan Nicolau

I've also tried a lot of tricks to get xpointer working with multiple child
elements, to no success. 
In the end, I've resorted to a less pretty, other-way-around solution. I do
something like this:
solrconfig_common.xml - no xml declaration, no root tag, no nothing
etc/etc
etc2/etc2
...
For each file that I need the common stuff into, I'd do something like this:
solrconfig_master.xml/solrconfig_slave.xml/etc.
?xml version=1.0 encoding=UTF-8 ?
!DOCTYPE config [
lt;!ENTITY solrconfigcommon SYSTEM
quot;solrconfig_common.xmlquot;
]

config
solrconfigcommon;

/config

Solr starts with 0 warnings, the configuration is properly loaded, etc.
Property substitution also works, including inside the
solrconfig_common.xml. Hope it helps anyone.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/XInclude-Multiple-Elements-tp3167658p3987029.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Dyer, James

Markus,

When you set spellcheck.maxCollationTries to a value greater than zero, the 
spellchecker will query each collation candidate to determine how many hits it 
would return.  If the collation will not yield any hits, it throws it away then 
tries some more (up to whatever value you set).  You can verify the correctness 
of this by setting spellcheck.maxCollationTries to zero (no checking) and 
then re-trying the collation(s) it suggests by hand (with the same fq params, 
etc).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Thursday, May 31, 2012 8:45 AM
To: solr-user@lucene.apache.org
Subject: spellcheck collate with fq parameters SOLR-2010

Hi,

It seems it doesn't work or i cannot get it to work. I've tried both the 
IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
correctly spelled flag is correct when considering the fq parameters but the 
collation is never when using a filter. I've also tried 
spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
high) makes the collation element to disappear. Are there any (open) issues 
that i'm not aware of?

Thanks,
Markus

RE: spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Markus Jelsma

Thanks James, that works nicely!

-Original message-
 From:Dyer, James james.d...@ingrambook.com
 Sent: Thu 31-May-2012 16:05
 To: solr-user@lucene.apache.org
 Subject: RE: spellcheck collate with fq parameters SOLR-2010

 Markus,

 When you set spellcheck.maxCollationTries to a value greater than zero, the 
 spellchecker will query each collation candidate to determine how many hits 
 it would return.  If the collation will not yield any hits, it throws it away 
 then tries some more (up to whatever value you set).  You can verify the 
 correctness of this by setting spellcheck.maxCollationTries to zero (no 
 checking) and then re-trying the collation(s) it suggests by hand (with the 
 same fq params, etc).

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
 Sent: Thursday, May 31, 2012 8:45 AM
 To: solr-user@lucene.apache.org
 Subject: spellcheck collate with fq parameters SOLR-2010

 Hi,

 It seems it doesn't work or i cannot get it to work. I've tried both the 
 IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
 correctly spelled flag is correct when considering the fq parameters but the 
 collation is never when using a filter. I've also tried 
 spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
 high) makes the collation element to disappear. Are there any (open) issues 
 that i'm not aware of?

 Thanks,
 Markus

Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Michael Della Bitta

I could be wrong about this, but Oracle has a table() function that I
believe turns the output of a function as a table. So possibly you
could wrap your procedure in a function that returns the cursor, or
convert the procedure to a function.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin afa...@yahoo.com wrote:
 Hi all,

 I've seen a few questions asked around invoking stored procedures from within 
 Data Import Handler but none of them seem to indicate what type of output 
 parameters were being used.

 I have a stored procedure created in Oracle database that takes a couple 
 input parameters and has an output parameter that is a reference cursor. The 
 cursor is expected to be used as a way of iterating through the returned 
 table rows. I'm using the following format to invoke my stored procedure in 
 the Data Import Handler's data config XML:

 entity name=entity_name ... query={call my_stored_proc(inParam1, 
 inParam2)} .../entity

 I have tested that this query works prior to attempting to use it from within 
 the DIH. But when I attempt to invoke this stored procedure, it naturally 
 complains that the output parameter is not specified (essentially a mismatch 
 in the number of parameters).

 I don't know of anyway to pass in a cursor parameter (or any output parameter 
 for that matter) to the stored procedure invocation from within the entity 
 definition.  I would greatly appreciate if anyone could provide any pointers 
 or hints on how to proceed.

 Thanks so much for your time

Re: Accent Characters

2012-05-31 Thread Vicente Couto

Hello, guys.

Now it's working. Thank you both Jack and Sami.
I fixed my issue by just using server.query(query, METHOD.POST) in solrJ and
yes, I was using HttpSolrServer. I have to move on to CommonsHttpSolrServer.

Thank you very much.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931p3987046.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-words synonyms matching

2012-05-31 Thread O. Klein

I have been struggling with this as well and found that using LUCENE_33 gives
the best results.

But as it will be deprecated this is no everlasting solution. May somebody
knows one?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987048.html
Sent from the Solr - User mailing list archive at Nabble.com.

per-fieldtype similarity not working

2012-05-31 Thread Markus Jelsma

Hi,

We intend to use different similarity implemenations for some field types 
configured according to SOLR-2338. I doubled checked with the schema in 
test-files and everything seems fine. However, the result is not correct and 
debugQuery shows the default configured similarity implementation is being used.

We simply declare the following in our fieldType:
similarity class=FQCN/


Thanks,
Markus

Re: per-fieldtype similarity not working

2012-05-31 Thread Robert Muir

On Thu, May 31, 2012 at 11:23 AM, Markus Jelsma
markus.jel...@openindex.io wrote:

 We simply declare the following in our fieldType:
 similarity class=FQCN/


Thats not enough, see the example:
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/schema-sim.xml


-- 
lucidimagination.com

Cannot get highlighting to work

2012-05-31 Thread Asfand Qazi


Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it 
was working just fine before on our 1.4 instance.


The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml

(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml

(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true

(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
  MGI:105369: {
symbol: [
  emCbx/emem1/em
],
marker_symbol: [
  emCbx/emem1/em
]
  }
}


I get:
{
  MGI:105369: { }
}


Can anyone help?

Thanks


--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.

Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller

Hi,

is it possible to configure a stopword list to the SpellCheckComponent?

For example:
When searching for the indexs the is filtered, because it is a stopword.
The SpellCheckComponent gives me a false suggestion for the.
But the SpellCheckComponent should only give a suggestion for index
because the is a stopword.

Kind Regards

Matthias

Re: Solr with UIMA

2012-05-31 Thread debdoot

Hi Tommaso,

I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:


org.apache.solr.common.SolrException: processing error: null. uid=5, 
text=quot;Test Room HAW GN-K35...quot;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd#40;UIMAUpdateRequestProcessor.java:107#41;
at
org.apache.solr.handler.XMLLoader.processUpdate#40;XMLLoader.java:158#41;
at org.apache.solr.handler.XMLLoader.load#40;XMLLoader.java:79#41;
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody#40;ContentStreamHandlerBase.java:58#41;
at
org.apache.solr.handler.RequestHandlerBase.handleRequest#40;RequestHandlerBase.java:129#41;
at org.apache.solr.core.SolrCore.execute#40;SolrCore.java:1372#41;
at
org.apache.solr.servlet.SolrDispatchFilter.execute#40;SolrDispatchFilter.java:356#41;
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter#40;SolrDispatchFilter.java:252#41;
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter#40;FilterInstanceWrapper.java:192#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter#40;WebAppFilterChain.java:89#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter#40;WebAppFilterManager.java:919#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters#40;WebAppFilterManager.java:1016#41;
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest#40;WebApp.java:3703#41;
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest#40;WebGroup.java:304#41;
at
com.ibm.ws.webcontainer.WebContainer.handleRequest#40;WebContainer.java:953#41;
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest#40;WSWebContainer.java:1655#41;
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready#40;WCChannelLink.java:195#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination#40;HttpInboundLink.java:452#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest#40;HttpInboundLink.java:511#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest#40;HttpInboundLink.java:305#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready#40;HttpInboundLink.java:276#41;
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators#40;NewConnectionInitialReadCallback.java:214#41;
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete#40;NewConnectionInitialReadCallback.java:113#41;
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted#40;AioReadCompletionListener.java:165#41;
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback#40;AbstractAsyncFuture.java:217#41;
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions#40;AsyncChannelFuture.java:161#41;
at com.ibm.io.async.AsyncFuture.completed#40;AsyncFuture.java:138#41;
at 
com.ibm.io.async.ResultHandler.complete#40;ResultHandler.java:204#41;
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop#40;ResultHandler.java:775#41;
at com.ibm.io.async.ResultHandler$2.run#40;ResultHandler.java:905#41;
at com.ibm.ws.util.ThreadPool$Worker.run#40;ThreadPool.java:1650#41;
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE#40;OverridingParamsAEProvider.java:86#41;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText#40;UIMAUpdateRequestProcessor.java:144#41;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd#40;UIMAUpdateRequestProcessor.java:77#41;
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.lt;initgt;#40;XMLInputSource.java:118#41;
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE#40;OverridingParamsAEProvider.java:58#41;
... 32 more

at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)



Please let me know if you have any insights on what could be the issue.

Thanks in advance,
Debdoot


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Strip html

2012-05-31 Thread Tigunn

Hello,
I have an index full text on xml files. 
Exemple:
---
item type=fragment n=3
cit dbp:hand=GF-encre

si les hi rend=underlineruches d’termabeilles/term
 /hi prouvent la
   monarchie, les fourmillières, les troupes d’éléphants ou
 de lb/
 choice
 origC/orig
 regc/reg
 /choiceastors prouvent la
 république.
bibl xml:id=b-7468-3/
/cit
/item
---
I use solr 1.4.1 to make full text search with php. When i search castor,
i can't fund this one. But if i search c astor it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
  monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc-addField('body_strip_html', $body_norm);   

In schema.xml:
fieldType name=text_strip_html class=solr.TextField
positionIncrementGap=100
analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
/analyzer
/fieldType

AND

   field name=body_strip_html type=text_strip_html indexed=true
stored=true/


But this don't work!
I want to return this xml files (look exemple) if i search castor.

Can you help me, please?
thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada

Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:

dataConfig
   dataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:XE user=admin password=admin /
   document
  entity name=documento query=SELECT
iddocumento,nrodocumento,asunto FROM documento
 field column=iddocumento name=iddocumento /
 field column=nrodocumento name=nrodocumento /
 field column=asunto name=asunto /
  /entity
   /document
/dataConfig

My problem is when I'm trying to use a different values in the field tag,
for example

 field column=asunto name=anotherasunto /

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:

types
  fieldtype name=string class=solr.StrField sortMissingLast=true
/
   /types

   fields
  !-- general --
  field name=iddocumento type=string indexed=true stored=true
required=true /
  field name=nrodocumento type=string indexed=true stored=true
/
  field name=anotherasunto type=string indexed=true
stored=true /
   /fields

Thanks in advance!

-- 
Rafael Taboada

Re: Solr with UIMA

2012-05-31 Thread debdoot

Further observation on the error:

All requests to add documents through the /update URL land up with the same
error, irrespective of the fields contained in the document. If I don't use
the UIMAUpdateRequestProcessor, I can add/update documents successfully.

Here are the snippets relevant to updateRequestProcessor declarations in my
solrconfig.xml

requestHandler name=/update 
  class=solr.XmlUpdateRequestHandler

 
   
   lst name=defaults
 str name=update.processoruima/str
   /lst
  
/requestHandler

updateRequestProcessorChain name=uima
  processor
class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
lst name=uimaConfig
  lst name=runtimeParameters
  /lst
  str name=analysisEngineC:\ex1\RoomNumberAnnotator.xml/str
  bool name=ignoreErrorsfalse/bool
  
  lst name=analyzeFields
bool name=mergefalse/bool
arr name=fields
  strcontent/str
/arr
  /lst
  lst name=fieldMappings
lst name=type
  str name=nameorg.apache.uima.tutorial.RoomNumber/str
  lst name=mapping
str name=featurebuilding/str
str name=fieldUIMAname/str
  /lst
/lst
  /lst
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain


Please help.

Thanks
Debdoot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987083.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Stop Words in SpellCheckComponent

2012-05-31 Thread Markus Jelsma

Add a stopwordfilter to your spellcheck field.

-Original message-
 From:Matthias Müller mm4...@googlemail.com
 Sent: Thu 31-May-2012 18:39
 To: solr-user@lucene.apache.org
 Subject: Stop Words in SpellCheckComponent

 Hi,

 is it possible to configure a stopword list to the SpellCheckComponent?

 For example:
 When searching for the indexs the is filtered, because it is a stopword.
 The SpellCheckComponent gives me a false suggestion for the.
 But the SpellCheckComponent should only give a suggestion for index
 because the is a stopword.

 Kind Regards

 Matthias

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-31 Thread sudarshan

Walter,
 Thanks again. Can you specify the criteria based on which Solr
optimizes/force merges segments automatically.  Is this defined by the
MergeFactor parameter - like if the mergefactor is 10, then merge happens
for every 10 segments? Please explain. 

Thanks,
Sudarshan 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Remote Solr Indexes?

2012-05-31 Thread sudarshan

Hi All,
   I'm new to Solr. I saw this post relating to Merging of indexes. I
have a similar doubt. From the post, I understand that merging of indexes
across different cores is possible only if the cores exist o a single
machine. I want to merge indexes of different machines. Can you please
explain me the different ways of doing this?

Say I have N+1 Solr engines of which there are N different masters and the
remaining 1 is meant for merging all N indexes together.  How I have decided
to merge N indexes to 1 is this.

1. Dynamically edit the solrconfig.xml file of the N+1st system to point as
a slave to different master each time. Hence a total of N trials would be
needed to cover all N masters.
2. During every trial I shall replicate the index of the master and store it
in a different folder. Say index1 from master1, index2 from master2 .
indexn from masterN.
3. After all indexes are replicated and moved/renamed to local directory, I
shall perform a merge of all indexes.


What problems will I have in implementing this? How efficient would be this?
I believe all index folders will have to be available locally to perform
merging. If not, please tell me how better can I do merge remote indexes.

Another question I have is about MergeFactor. If I set the mergefactor as 5,
will Solr automatically takes care of merging the segments to 1 if the
number of segments reach 5? How this can be exploited?

Your assistance is sincerely appreciated.

Regards,
Sudarshan

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-Remote-Solr-Indexes-tp3434412p3987090.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data Import Handler fields with different values in column and name

2012-05-31 Thread Jack Krupansky

Is there any chance that you added the anotherasunto field and then forgot 
to shut down and reload Solr? Any time you edit schema.xml or solrconfig.xml 
you need to reload Solr for the changes to take effect.


-- Jack Krupansky

-Original Message- 
From: Rafael Taboada

Sent: Thursday, May 31, 2012 1:30 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler fields with different values in column and name

Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:

dataConfig
  dataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:XE user=admin password=admin /
  document
 entity name=documento query=SELECT
iddocumento,nrodocumento,asunto FROM documento
field column=iddocumento name=iddocumento /
field column=nrodocumento name=nrodocumento /
field column=asunto name=asunto /
 /entity
  /document
/dataConfig

My problem is when I'm trying to use a different values in the field tag,
for example

field column=asunto name=anotherasunto /

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:

types
 fieldtype name=string class=solr.StrField sortMissingLast=true
/
  /types

  fields
 !-- general --
 field name=iddocumento type=string indexed=true stored=true
required=true /
 field name=nrodocumento type=string indexed=true stored=true
/
 field name=anotherasunto type=string indexed=true
stored=true /
  /fields

Thanks in advance!

--
Rafael Taboada

Re: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada

Jack,

Thanks for your help.

I restarted solr when I was changing schema.xml anytime.

Any doc about this mentions it is possible to map the column with another
name value. But I can't.

Thanks again.

Rafael

On Thu, May 31, 2012 at 1:27 PM, Jack Krupansky j...@basetechnology.comwrote:

 Is there any chance that you added the anotherasunto field and then
 forgot to shut down and reload Solr? Any time you edit schema.xml or
 solrconfig.xml you need to reload Solr for the changes to take effect.

 -- Jack Krupansky

 -Original Message- From: Rafael Taboada
 Sent: Thursday, May 31, 2012 1:30 PM
 To: solr-user@lucene.apache.org
 Subject: Data Import Handler fields with different values in column and
 name


 Hi folks,

 I'm using Solr 3.6 and I'm trying to import data from my database to solr
 using Data Import Handler. My db-config is like this:

 dataConfig
  dataSource driver=oracle.jdbc.**OracleDriver
 url=jdbc:oracle:thin:@**localhost:1521:XE user=admin password=admin
 /
  document
 entity name=documento query=SELECT
 iddocumento,nrodocumento,**asunto FROM documento
field column=iddocumento name=iddocumento /
field column=nrodocumento name=nrodocumento /
field column=asunto name=asunto /
 /entity
  /document
 /dataConfig

 My problem is when I'm trying to use a different values in the field tag,
 for example

field column=asunto name=anotherasunto /

 When I use different name from column, this field is omitted. Please can
 you help me with this issue?

 My schema.xml is:

 types
 fieldtype name=string class=solr.StrField sortMissingLast=true
 /
  /types

  fields
 !-- general --
 field name=iddocumento type=string indexed=true stored=true
 required=true /
 field name=nrodocumento type=string indexed=true stored=true
 /
 field name=anotherasunto type=string indexed=true
 stored=true /
  /fields

 Thanks in advance!

 --
 Rafael Taboada




-- 
Rafael Taboada

/*
 * Phone  992 741 026
 */

Re: Strip html

2012-05-31 Thread Jack Krupansky

There is no option in the Strip HTML filter to discard whitespace between 
elements. And it certainly doesn't know the semantics of some XML schema for 
choice. You'll have to pre-process that semantics before Solr ingestion, 
or do your own custom filter.


-- Jack Krupansky

-Original Message- 
From: Tigunn

Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html

Hello,
I have an index full text on xml files.
Exemple:
---
item type=fragment n=3
   cit dbp:hand=GF-encre

si les hi rend=underlineruches d’termabeilles/term

/hi prouvent la
  monarchie, les fourmillières, les troupes d’éléphants ou
de lb/
choice
origC/orig
regc/reg
/choiceastors prouvent la
république.

   bibl xml:id=b-7468-3/
   /cit
   /item
---
I use solr 1.4.1 to make full text search with php. When i search castor,
i can't fund this one. But if i search c astor it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
 monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc-addField('body_strip_html', $body_norm);

In schema.xml:
fieldType name=text_strip_html class=solr.TextField
positionIncrementGap=100
   analyzer
   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.StandardTokenizerFactory/
   /analyzer
   /fieldType

AND

  field name=body_strip_html type=text_strip_html indexed=true
stored=true/


But this don't work!
I want to return this xml files (look exemple) if i search castor.

Can you help me, please?
thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller

 is it possible to configure a stopword list to the SpellCheckComponent?

 Add a stopwordfilter to your spellcheck field.

Hmm, I did. Could it be another mistake?

This is the schema definition:

fieldType name=spellcheck_de class=solr.TextField
positionIncrementGap=100
  analyzer
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent-nouml.txt /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=^(.*)[\.\-\']$ replacement=$1 /
filter class=solr.StopFilterFactory ignoreCase=true
words=german_stop_long.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

This is the solrconfig:

  requestHandler name=search_de class=solr.SearchHandler
 lst name=defaults
   str name=defTypeedismax/str
   int name=rows10/int
   str name=qftext_de title_de^5/str
   str name=pftext_de title_de^5/str

   str name=spellchecktrue/str
   str name=mm0/str
 /lst

 arr name=last-components
   strspellcheck_de/str
 /arr
  /requestHandler


  searchComponent name=spellcheck_de class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellcheck_de/str
  str name=spellcheckIndexDirspellchecker_de/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=buildOnOptimizetrue/str
/lst
  /searchComponent

Fwd: Strip html

2012-05-31 Thread Michael Della Bitta

If I'm not mistaken, that's TEI, and I suggest you consult with the
TEI community for strategies for document indexing, as there are a lot
of branching-style tags in TEI. My guess is that you'll hear that it's
best to perform some sort of term expansion on the document as a
preprocessing step.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com





-Original Message- From: Tigunn
Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html


Hello,
I have an index full text on xml files.
Exemple:
---
item type=fragment n=3
                          cit dbp:hand=GF-encre

si les hi rend=underlineruches d’termabeilles/term

                                    /hi prouvent la
                  monarchie, les fourmillières, les troupes d’éléphants ou
 de lb/
                                    choice
                                        origC/orig
                                        regc/reg
                                    /choiceastors prouvent la
 république.

                              bibl xml:id=b-7468-3/
                          /cit
                      /item
---
I use solr 1.4.1 to make full text search with php. When i search castor,
i can't fund this one. But if i search c astor it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
                monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc-addField('body_strip_html', $body_norm);

In schema.xml:
fieldType name=text_strip_html class=solr.TextField
positionIncrementGap=100
      analyzer
              charFilter class=solr.HTMLStripCharFilterFactory/
              tokenizer class=solr.StandardTokenizerFactory/
      /analyzer
  /fieldType

AND

 field name=body_strip_html type=text_strip_html indexed=true
stored=true/


But this don't work!
I want to return this xml files (look exemple) if i search castor.

Can you help me, please?
thanks.


--
View this message in context:
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky

Spellcheck wants a field, not a field type. You have a spellcheck_de field 
type, but you need a field as well.


str name=fieldspellcheck_de/str

That should reference a field, not a field type.

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Thursday, May 31, 2012 3:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


is it possible to configure a stopword list to the SpellCheckComponent?



Add a stopwordfilter to your spellcheck field.


Hmm, I did. Could it be another mistake?

This is the schema definition:

   fieldType name=spellcheck_de class=solr.TextField
positionIncrementGap=100
 analyzer
   charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent-nouml.txt /
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=^(.*)[\.\-\']$ replacement=$1 /
   filter class=solr.StopFilterFactory ignoreCase=true
words=german_stop_long.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

This is the solrconfig:

 requestHandler name=search_de class=solr.SearchHandler
lst name=defaults
  str name=defTypeedismax/str
  int name=rows10/int
  str name=qftext_de title_de^5/str
  str name=pftext_de title_de^5/str

  str name=spellchecktrue/str
  str name=mm0/str
/lst

arr name=last-components
  strspellcheck_de/str
/arr
 /requestHandler


 searchComponent name=spellcheck_de class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypetextSpell/str
   lst name=spellchecker
 str name=namedefault/str
 str name=fieldspellcheck_de/str
 str name=spellcheckIndexDirspellchecker_de/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=buildOnOptimizetrue/str
   /lst
 /searchComponent

possible status codes from solr during a (DIH) data import process

2012-05-31 Thread geeky2

hello all,

i have been asked to write a small polling script (bash) to periodically
check the status of an import on our Master.  our import times are small,
but there are business reasons why we want to know the status of an import
after a specified amount of time.

i need to perform certain actions based on the status of the import, and
therefore need to quantify which tags to check and their appropriate states.

i am using the command from the DataImportHandler HTTP API to get the status
of the import:

OUTPUT=$(curl -v
http://${SERVER}:${PORT}/somecore/dataimport?command=status)




can someone tell me if i have these rules correct?

1) during an import - the status tag will have a busy state:

example:

  str name=statusbusy/str

2) at the completion of an import (regardless of failure or success) the
status tag will have an idle state:

example:

  str name=statusidle/str


3) to determine if an import failed or succeeded - you must interrogate the
tags under   lst name=statusMessages and specifically look for :

success: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

failure: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-31 Thread Walter Underwood

http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

The defaults are very good. I have never changed them, and I've had Solr in 
production at two major sites, Netflix and Chegg.

Don't spend any more time worrying about merges.

wunder

On May 31, 2012, at 10:51 AM, sudarshan wrote:

 Walter,
 Thanks again. Can you specify the criteria based on which Solr
 optimizes/force merges segments automatically.  Is this defined by the
 MergeFactor parameter - like if the mergefactor is 10, then merge happens
 for every 10 segments? Please explain. 
 
 Thanks,
 Sudarshan 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Remote Solr Indexes?

2012-05-31 Thread Lance Norskog

Merging indexes is not really useful- it won't make distributed search
any faster. There are features that don't work with distributed
search. Really, you are better off having shards with enough documents
so that relevance scoring is balanced.

On Thu, May 31, 2012 at 11:04 AM, sudarshan
chakravarthy.sudars...@gmail.com wrote:
 Hi All,
       I'm new to Solr. I saw this post relating to Merging of indexes. I
 have a similar doubt. From the post, I understand that merging of indexes
 across different cores is possible only if the cores exist o a single
 machine.     I want to merge indexes of different machines. Can you please
 explain me the different ways of doing this?

 Say I have N+1 Solr engines of which there are N different masters and the
 remaining 1 is meant for merging all N indexes together.  How I have decided
 to merge N indexes to 1 is this.

 1. Dynamically edit the solrconfig.xml file of the N+1st system to point as
 a slave to different master each time. Hence a total of N trials would be
 needed to cover all N masters.
 2. During every trial I shall replicate the index of the master and store it
 in a different folder. Say index1 from master1, index2 from master2 .
 indexn from masterN.
 3. After all indexes are replicated and moved/renamed to local directory, I
 shall perform a merge of all indexes.


 What problems will I have in implementing this? How efficient would be this?
 I believe all index folders will have to be available locally to perform
 merging. If not, please tell me how better can I do merge remote indexes.

 Another question I have is about MergeFactor. If I set the mergefactor as 5,
 will Solr automatically takes care of merging the segments to 1 if the
 number of segments reach 5? How this can be exploited?

 Your assistance is sincerely appreciated.

 Regards,
 Sudarshan



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Merging-Remote-Solr-Indexes-tp3434412p3987090.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com

Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Lance Norskog

Can you add a new stored procedure that uses your current one? It
would operate like the DIH expects.

I don't remember if DB cursors are a standard part of JDBC. If they
are, it would be a great addition to the DIH if they work right.

On Thu, May 31, 2012 at 10:44 AM, Niran Fajemisin afa...@yahoo.com wrote:
 Thanks for your response, Michael. Unfortunately changing the stored 
 procedure is not really an option here.

 From what I'm seeing, it would appear that there's really no way of somehow 
 instructing the Data Import Handler to get a handle on the output parameter 
 from the stored procedure. It's a bit surprising though that no one has ran 
 into this scenario but I suppose most people just work around it.

 Anyone else care to shed some more light on alternative approaches? Thanks 
 again.




 From: Michael Della Bitta michael.della.bi...@appinions.com
To: solr-user@lucene.apache.org
Sent: Thursday, May 31, 2012 9:40 AM
Subject: Re: Using Data Import Handler to invoke a stored procedure with 
output (cursor) parameter

I could be wrong about this, but Oracle has a table() function that I
believe turns the output of a function as a table. So possibly you
could wrap your procedure in a function that returns the cursor, or
convert the procedure to a function.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin afa...@yahoo.com wrote:
 Hi all,

 I've seen a few questions asked around invoking stored procedures from 
 within Data Import Handler but none of them seem to indicate what type of 
 output parameters were being used.

 I have a stored procedure created in Oracle database that takes a couple 
 input parameters and has an output parameter that is a reference cursor. 
 The cursor is expected to be used as a way of iterating through the 
 returned table rows. I'm using the following format to invoke my stored 
 procedure in the Data Import Handler's data config XML:

 entity name=entity_name ... query={call my_stored_proc(inParam1, 
 inParam2)} .../entity

 I have tested that this query works prior to attempting to use it from 
 within the DIH. But when I attempt to invoke this stored procedure, it 
 naturally complains that the output parameter is not specified (essentially 
 a mismatch in the number of parameters).

 I don't know of anyway to pass in a cursor parameter (or any output 
 parameter for that matter) to the stored procedure invocation from within 
 the entity definition.  I would greatly appreciate if anyone could 
 provide any pointers or hints on how to proceed.

 Thanks so much for your time







-- 
Lance Norskog
goks...@gmail.com

Re: Cannot get highlighting to work

2012-05-31 Thread Jack Krupansky

Try a query that uses a term that doesn't split an alphanumeric term into 
two terms.


Then check to see what field type you used for the symbol and marker_symbol 
fields and whether the analyzer for that field type has changed in 3.6.





-- Jack Krupansky
-Original Message- 
From: Asfand Qazi

Sent: Thursday, May 31, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Cannot get highlighting to work

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it
was working just fine before on our 1.4 instance.

The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml

(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml

(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true

(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
  MGI:105369: {
symbol: [
  emCbx/emem1/em
],
marker_symbol: [
  emCbx/emem1/em
]
  }
}


I get:
{
  MGI:105369: { }
}


Can anyone help?

Thanks


--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Re: index merge

2012-05-31 Thread sudarshan

Hi All,
   I have a basic doubt about index merging in Solr.  The setup that I
have followed is as follows:

Setup:
I used the schema.xml that comes with the solr example. I had three cores -
core0, core1 and core2.   I tried merging the indexes of core 0 and core 1
to core2.  I copied the same schema.xml from SOLR_HOME/example/solr/conf to
core 0 and core 1 but changed the name field alone as core0 and core1
respectively.
 
Operations:
I indexed different files to core0 and core1. The search *:* in Solr showed
6 files and 9 files for core0 and core1 respectively.  Then merged the
indexes of core0 and core1 to core2. As expected the search *:* showed 15
files for core2. I added 2 new files to the index of core0 and 1 file to
core1 and merged again to core2. This time to my surprise * showed the
total number of files showed to be 33 = (15+18) instead of just 18. This
duplication continued for each merge operation which is not efficient. Also
the merged files were available for search only after restarting the Jetty
server. Am I missing something or doing things wrongly? Is there a way to
restart only a specific core to read the new index/reflect the merged
changes? Please explain the merge operation.

Thanks,
Sudarshan   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-merge-tp472904p3987121.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Dyer, James

You've got it right.  Here's a summary:

- status = busy means its in-process.  
- status = idle means its finished (success or failure).
- You can drill down further by looking at sub-elements under statusMessages :
  if there is str name=Aborted / , it means the last import was cancelled 
  with command=abort
  look at the body of str name= /.  
   o If it begins with Indexing completed., then it finished with a success.
   o If it begins with Indexing failed., then it finished with a failure.

Just be careful to test your script whenever you change DIH versions.  This 
status screen isn't the best and no doubt it will change sometime in the 
future.  Also, keep in mind that as soon as the next import begins the old 
statuses get lost so you'll need to plan your script runs around that.

Someday it'll be nice if we can come up with a better way than this to 
programitically interact with DIH...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Thursday, May 31, 2012 2:43 PM
To: solr-user@lucene.apache.org
Subject: possible status codes from solr during a (DIH) data import process

hello all,

i have been asked to write a small polling script (bash) to periodically
check the status of an import on our Master.  our import times are small,
but there are business reasons why we want to know the status of an import
after a specified amount of time.

i need to perform certain actions based on the status of the import, and
therefore need to quantify which tags to check and their appropriate states.

i am using the command from the DataImportHandler HTTP API to get the status
of the import:

OUTPUT=$(curl -v
http://${SERVER}:${PORT}/somecore/dataimport?command=status)




can someone tell me if i have these rules correct?

1) during an import - the status tag will have a busy state:

example:

  str name=statusbusy/str

2) at the completion of an import (regardless of failure or success) the
status tag will have an idle state:

example:

  str name=statusidle/str


3) to determine if an import failed or succeeded - you must interrogate the
tags under   lst name=statusMessages and specifically look for :

success: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

failure: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Rahul Warawdekar

Hi,

Thats correct.
For failure, you have to check for the text *Indexing failed. Rolled back
changes* under the lst name=statusMessages tag.
One more thing to note here is that there may be a time during the indexing
process where the indexing is complete but the index is not committed and
optimized yet.
You would need to check if the response listed below is present along with
the success message to term it as a complete success.

*str name=Committed2012-05-31 15:10:45/str
str name=Optimized2012-05-31 15:10:45/str*

On Thu, May 31, 2012 at 3:42 PM, geeky2 gee...@hotmail.com wrote:

 hello all,

 i have been asked to write a small polling script (bash) to periodically
 check the status of an import on our Master.  our import times are small,
 but there are business reasons why we want to know the status of an import
 after a specified amount of time.

 i need to perform certain actions based on the status of the import, and
 therefore need to quantify which tags to check and their appropriate
 states.

 i am using the command from the DataImportHandler HTTP API to get the
 status
 of the import:

 OUTPUT=$(curl -v
 http://${SERVER}:${PORT}/somecore/dataimport?command=status)




 can someone tell me if i have these rules correct?

 1) during an import - the status tag will have a busy state:

 example:

  str name=statusbusy/str

 2) at the completion of an import (regardless of failure or success) the
 status tag will have an idle state:

 example:

  str name=statusidle/str


 3) to determine if an import failed or succeeded - you must interrogate the
 tags under   lst name=statusMessages and specifically look for :

 success:
 str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
 documents./str

 failure:
 str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
 documents./str

 thank you,


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar

Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada

Please,

Can anyone guide me through this issue? Thanks



-- Forwarded message --
From: Rafael Taboada kaliman.fore...@gmail.com
Date: Thu, May 31, 2012 at 12:30 PM
Subject: Data Import Handler fields with different values in column and name
To: solr-user@lucene.apache.org


Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:

dataConfig
   dataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:XE user=admin password=admin /
   document
  entity name=documento query=SELECT
iddocumento,nrodocumento,asunto FROM documento
 field column=iddocumento name=iddocumento /
 field column=nrodocumento name=nrodocumento /
 field column=asunto name=asunto /
  /entity
   /document
/dataConfig

My problem is when I'm trying to use a different values in the field tag,
for example

 field column=asunto name=anotherasunto /

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:

types
  fieldtype name=string class=solr.StrField sortMissingLast=true
/
   /types

   fields
  !-- general --
  field name=iddocumento type=string indexed=true stored=true
required=true /
  field name=nrodocumento type=string indexed=true stored=true
/
  field name=anotherasunto type=string indexed=true
stored=true /
   /fields

Thanks in advance!

-- 
Rafael Taboada






-- 
Rafael Taboada

/*
 * Phone  992 741 026
 */

Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread jmlucjav

there is at least one scenario where no error is reported when it should be,
if the host runs out of disk when optimizing, it is not reported.

There is a jira issue open I think

--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110p3987144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strip html

2012-05-31 Thread Chris Hostetter


: I make a transformation XSLT which return :
: ---
: si les ruches d’abeilles prouvent la
:   monarchie, les fourmillières, les troupes d’éléphants ou
: de castors prouvent la république.
: ---
: i put this html in solr:  $doc-addField('body_strip_html', $body_norm);   
...
: But this don't work!
: I want to return this xml files (look exemple) if i search castor.

I'm confused.

a) you said you've already transformed your input XML into plain text -- 
so i don't see what you need HTML striping at all.
b) your current problem doesn't seem to have anything to do with HTML or 
XML ... you're asking why a document containing castors (plural) doesn't 
match a query for castor (singular) but the field type you say are using 
has a very simple analyzer that doens't do any stemming of any kind...

analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
/analyzer

..since there is no HTML in your input, HTMLStripCharFilterFactory is a 
no-op.  which leaves StandardTokenizerFactory which just does 
tokenization.

It seems like all you need to do is add a stemmer (and for efficiency: 
remove the HTMLStripCharFilterFactory).  I'm no expert, but it looks like 
you are indexing french, so i would suggest using a french stemmer...

https://wiki.apache.org/solr/LanguageAnalysis#French



-Hoss

Re: Solr with UIMA

2012-05-31 Thread Jack Krupansky

Is it failing on the first document? I see uid 5, suggests that it is not. 
If not, how is this document different from the others?


I see the exception
org.apache.uima.resource.ResourceInitializationException, suggesting that 
some file cannot be loaded.


It sounds like it may be having trouble loading aePath (analysisEngine). 
Or maybe some other file?


-- Jack Krupansky

-Original Message- 
From: debdoot

Sent: Thursday, May 31, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr with UIMA

Hi Tommaso,

I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:


org.apache.solr.common.SolrException: processing error: null. uid=5,
text=quot;Test Room HAW GN-K35...quot;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd#40;UIMAUpdateRequestProcessor.java:107#41;
at
org.apache.solr.handler.XMLLoader.processUpdate#40;XMLLoader.java:158#41;
at org.apache.solr.handler.XMLLoader.load#40;XMLLoader.java:79#41;
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody#40;ContentStreamHandlerBase.java:58#41;
at
org.apache.solr.handler.RequestHandlerBase.handleRequest#40;RequestHandlerBase.java:129#41;
at org.apache.solr.core.SolrCore.execute#40;SolrCore.java:1372#41;
at
org.apache.solr.servlet.SolrDispatchFilter.execute#40;SolrDispatchFilter.java:356#41;
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter#40;SolrDispatchFilter.java:252#41;
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter#40;FilterInstanceWrapper.java:192#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter#40;WebAppFilterChain.java:89#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter#40;WebAppFilterManager.java:919#41;
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters#40;WebAppFilterManager.java:1016#41;
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest#40;WebApp.java:3703#41;
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest#40;WebGroup.java:304#41;
at
com.ibm.ws.webcontainer.WebContainer.handleRequest#40;WebContainer.java:953#41;
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest#40;WSWebContainer.java:1655#41;
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready#40;WCChannelLink.java:195#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination#40;HttpInboundLink.java:452#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest#40;HttpInboundLink.java:511#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest#40;HttpInboundLink.java:305#41;
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready#40;HttpInboundLink.java:276#41;
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators#40;NewConnectionInitialReadCallback.java:214#41;
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete#40;NewConnectionInitialReadCallback.java:113#41;
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted#40;AioReadCompletionListener.java:165#41;
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback#40;AbstractAsyncFuture.java:217#41;
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions#40;AsyncChannelFuture.java:161#41;
at com.ibm.io.async.AsyncFuture.completed#40;AsyncFuture.java:138#41;
at com.ibm.io.async.ResultHandler.complete#40;ResultHandler.java:204#41;
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop#40;ResultHandler.java:775#41;
at com.ibm.io.async.ResultHandler$2.run#40;ResultHandler.java:905#41;
at com.ibm.ws.util.ThreadPool$Worker.run#40;ThreadPool.java:1650#41;
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE#40;OverridingParamsAEProvider.java:86#41;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText#40;UIMAUpdateRequestProcessor.java:144#41;
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd#40;UIMAUpdateRequestProcessor.java:77#41;
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.lt;initgt;#40;XMLInputSource.java:118#41;
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE#40;OverridingParamsAEProvider.java:58#41;
... 32 more

at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)



Please let me know if you have any insights on what could be the issue.

Thanks in advance,
Debdoot


--
View

Re: Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Jack Krupansky


It looks okay; renaming a column is fine.

Maybe... maybe when you re-run it DIH is not replacing any documents that 
already have id's in Solr, leaving them with their old field values. Maybe 
you need to manually delete the old Solr documents and run a fresh full 
import.


-- Jack Krupansky

-Original Message- 
From: Rafael Taboada

Sent: Thursday, May 31, 2012 5:13 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Data Import Handler fields with different values in column and 
name


Please,

Can anyone guide me through this issue? Thanks



-- Forwarded message --
From: Rafael Taboada kaliman.fore...@gmail.com
Date: Thu, May 31, 2012 at 12:30 PM
Subject: Data Import Handler fields with different values in column and name
To: solr-user@lucene.apache.org


Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:

dataConfig
  dataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:XE user=admin password=admin /
  document
 entity name=documento query=SELECT
iddocumento,nrodocumento,asunto FROM documento
field column=iddocumento name=iddocumento /
field column=nrodocumento name=nrodocumento /
field column=asunto name=asunto /
 /entity
  /document
/dataConfig

My problem is when I'm trying to use a different values in the field tag,
for example

field column=asunto name=anotherasunto /

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:

types
 fieldtype name=string class=solr.StrField sortMissingLast=true
/
  /types

  fields
 !-- general --
 field name=iddocumento type=string indexed=true stored=true
required=true /
 field name=nrodocumento type=string indexed=true stored=true
/
 field name=anotherasunto type=string indexed=true
stored=true /
  /fields

Thanks in advance!

--
Rafael Taboada






--
Rafael Taboada

/*
* Phone  992 741 026
*/

Re: Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada

Hi Jack,

Thanks for your help.

I delete conf/data/* every restart so make sure to work with clean data.

is there any other config I should do?. Maybe another xml file.

Kind regards

On Thu, May 31, 2012 at 5:18 PM, Jack Krupansky j...@basetechnology.comwrote:

 It looks okay; renaming a column is fine.

 Maybe... maybe when you re-run it DIH is not replacing any documents that
 already have id's in Solr, leaving them with their old field values. Maybe
 you need to manually delete the old Solr documents and run a fresh full
 import.


 -- Jack Krupansky

 -Original Message- From: Rafael Taboada
 Sent: Thursday, May 31, 2012 5:13 PM
 To: solr-user@lucene.apache.org
 Subject: Fwd: Data Import Handler fields with different values in column
 and name


 Please,

 Can anyone guide me through this issue? Thanks



 -- Forwarded message --
 From: Rafael Taboada kaliman.fore...@gmail.com
 Date: Thu, May 31, 2012 at 12:30 PM
 Subject: Data Import Handler fields with different values in column and
 name
 To: solr-user@lucene.apache.org


 Hi folks,

 I'm using Solr 3.6 and I'm trying to import data from my database to solr
 using Data Import Handler. My db-config is like this:

 dataConfig
  dataSource driver=oracle.jdbc.**OracleDriver
 url=jdbc:oracle:thin:@**localhost:1521:XE user=admin password=admin
 /
  document
 entity name=documento query=SELECT
 iddocumento,nrodocumento,**asunto FROM documento
field column=iddocumento name=iddocumento /
field column=nrodocumento name=nrodocumento /
field column=asunto name=asunto /
 /entity
  /document
 /dataConfig

 My problem is when I'm trying to use a different values in the field tag,
 for example

field column=asunto name=anotherasunto /

 When I use different name from column, this field is omitted. Please can
 you help me with this issue?

 My schema.xml is:

 types
 fieldtype name=string class=solr.StrField sortMissingLast=true
 /
  /types

  fields
 !-- general --
 field name=iddocumento type=string indexed=true stored=true
 required=true /
 field name=nrodocumento type=string indexed=true stored=true
 /
 field name=anotherasunto type=string indexed=true
 stored=true /
  /fields

 Thanks in advance!

 --
 Rafael Taboada






 --
 Rafael Taboada

 /*
 * Phone  992 741 026
 */




-- 
Rafael Taboada

/*
 * Phone  992 741 026
 */

index special characters solr

2012-05-31 Thread KPK

Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , % 
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Challenge: Is dynamic data source possible for DataImportHandler JdbcDataSource?

2012-05-31 Thread Cheng Zhang

Hi,

The challenge I'm facing is some sort of dynamic data source. Your valuable 
input is highly appreciated.

Below is my data-config.xml. I have one user database and two company 
databases. The user table in the user database has four columns which are id + 
name + company_dbname + company_id. Depending on the company_dbname, I need to 
look up either companydb0 or companydb1 to get the company name by the 
company_id. 

dataConfig
    dataSource type=JdbcDataSource
    name=userdb
    driver=com.mysql.jdbc.Driver
    url=jdbc:mysql://db0.com:3306/user
    user=xxx
    password=calltextual batchSize=-1/

    dataSource type=JdbcDataSource
    name=companydb0
    driver=com.mysql.jdbc.Driver
    url=jdbc:mysql://companydb0.com:3306/company
    user=xxx
    password=calltextual batchSize=-1/

    dataSource type=JdbcDataSource
    name=companydb1
    driver=com.mysql.jdbc.Driver
    url=jdbc:mysql://companydb1.com:3306/company
    user=xxx
    password=calltextual batchSize=-1/

    document name=USERS
    entity name=USER dataSource=userdb
    query=SELECT id, name, company_dbname, company_id from user 
    field column=id name=id /
    field column=name name=name /
        entity name=company dataSource=${USER.company_dbname}
            query=SELECT name from company
            WHERE id = '${PG0.company_id}'   
        field column=name name=company_name /
            /entity
    /entity
    /document
/dataConfig

Is it doable to set the data source dynamically for the child entity? In my 
case, I would like to set company entity dataSource to 
${USER.company_dbname}  which is returned from USER entity query.

If it's not doable with current implementation, I would like to download the 
source code and customize it for my needs. Which source java file I should 
start with?

Many many thanks,

Kevin

Re: index special characters solr

2012-05-31 Thread Jack Krupansky

Special characters are filtered out of (most) text fields, but are 
preserved in string fields. String fields might suit your needs, but are 
inconvenient for keyword searching.


You may be able to use the types option of the WordDelimiterFilterFactory 
to pass in a custom character type table that has the special characters 
treated as alphabetic characters. Otherwise, you may have to customize the 
code yourself.


-- Jack Krupansky

-Original Message- 
From: KPK

Sent: Thursday, May 31, 2012 7:38 PM
To: solr-user@lucene.apache.org
Subject: index special characters solr

Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , %
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller

 str name=fieldspellcheck_de/str

 That should reference a field, not a field type.

Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added the to the stopwords.txt
2. added thex to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for the solr
http://myhost:8983/solr/select?q=the+solrspellcheck=truewt=json
6. got the desired result, but also the wrong suggestion thex

{ response : { docs : [ {...  name : Solr, thex Enterprise
Search Server, ..  } ],
  numFound : 1,
...  },
...
  spellcheck : { suggestions : [ the,
  {...suggestion : [ thex ]  }
] }
}


Here's the complete diff between the original download and my 3 modifications:

diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
   field name=nameSolr, the Enterprise Search Server/field
---
   field name=nameSolr, thex Enterprise Search Server/field
diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785
  arr name=last-components
strspellcheck/str
  /arr

1122a1127
   str name=buildOnCommittrue/str
diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16

 the