loading solr from Pig?

2013-08-21 Thread geeky2
Hello All,

Is anyone loading Solr from a Pig script / process?

I was talking to another group in our company and they have standardized on
MongoDB instead of Solr - apparently there is very good support between
MongoDB and Pig - allowing users to stream data directly from a Pig
process in to MongoDB.

Does solr have anything like this as well?

thx
mark







--
View this message in context: 
http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-10 Thread geeky2
i will try it.

i guess i made a poor assumption that you would not get predictable
results when copying a code like mycode to an int field where where the
desired end result in the int field is say, 1.

i was worried that some sort of ascii conversion or wrap around would
happen in the int field.

thx for the insight.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-10 Thread geeky2
i will try it out and let you know - 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069339.html
Sent from the Solr - User mailing list archive at Nabble.com.


translating a character code to an ordinal?

2013-06-07 Thread geeky2
hello all,

environment: solr 3.5, centos

problem statement:  i have several character codes that i want to translate
to ordinal (integer) values (for sorting), while retaining the original code
field in the document.

i was thinking that i could use a copyField from my code field to my ord
field - then employ a pattern replace filter factory during indexing.

but won't the copyfield fail because the two field types are different?

ps: i also read the wiki about
http://wiki.apache.org/solr/DataImportHandler#Transformer the script
transformer and regex transformer - but was hoping to avoid this - if i
could.




thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-07 Thread geeky2
hello jack,

thank you for the code ;)

what book are you referring to?  AFAICT - all of the 4.0 books are future
order.

we won't be moving to 4.0 (soon enough).

so i take it - copyfield will not work, eg - i cannot take a code like ABC
and copy it to an int field and then use the regex to turn it in to an
ordinal?

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-07 Thread geeky2
thx,


please send me a link to the book so i get/purchase it.


thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068997.html
Sent from the Solr - User mailing list archive at Nabble.com.


custom field tutorial

2013-06-07 Thread geeky2
can someone point me to a custom field tutorial.

i checked the wiki and this list - but still a little hazy on how i would do
this.

essentially - when the user issues a query, i want my class to interrogate a
string field (containing several codes - example boo, baz, bar) 

and return a single integer field that maps to the string field (containing
the code).

example: 

boo=1
baz=2
bar=3

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-field-tutorial-tp4068998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: seeing lots of autowarming messages in log during DIH indexing

2013-05-31 Thread geeky2
the DIH is launched via a script - called by a cron like scheduler.

clean, commit and optimize are all true.

thx
mark



#!/bin/bash
SERVER=$1
PORT=$2
CLEAN=$3
COMMIT=$4
OPTIMIZE=$5
COREPATH=$6

echo SERVER: $SERVER
echo PORT: $PORT
echo CLEAN: $CLEAN
echo COMMIT: $COMMIT
echo OPTIMIZE: $OPTIMIZE
echo COREPATH: $COREPATH


if [ $# != 6 ]; then
echo USAGE: $0 [SERVER] [PORT] [CLEAN: true/false] [COMMIT:
true/false] [OPTIMIZE: true/false] [COREPATH] 
exit 1;
fi

...






--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4067477.html
Sent from the Solr - User mailing list archive at Nabble.com.


seeing lots of autowarming messages in log during DIH indexing

2013-05-20 Thread geeky2
hello,

we are tracking down some performance issues with our DIH process.

not sure if this is related - but i am seeing tons of the messages below in
the logs during re-indexing of the core.

what do these messages mean?


2013-05-18 19:37:30,623 INFO  [org.apache.solr.update.UpdateHandler]
(pool-11-thread-1) end_commit_flush
2013-05-18 19:37:30,623 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,624 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,624 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,625 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,625 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=3,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af
main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-05-18 19:37:30,628 INFO  [org.apache.solr.search.SolrIndexSearcher]
(pool-10-thread-1) autowarming result for Searcher@5b8d745 main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: seeing lots of autowarming messages in log during DIH indexing

2013-05-20 Thread geeky2
you mean i would add this switch to my script that kicks of the dataimport?

exmaple:


OUTPUT=$(curl -v
http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport
-F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
optimize=${OPTIMIZE} -F openSearcher=false)


what needs to be done _AFTER_ the DIH finishes (if anything)?

eg, does this need to be turned back on after the DIH has finished?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-18 Thread geeky2
hello

your comment made me think - so i decided to double check myself.

i opened up the schema in squirrel and made sure that the two columns in
question were actually of type TEXT in the schema - check

i went in to the db-config.xml and removed all references to
ClobTransformer, removed the cast directives from the fields as well as the
clob=true on the two fields - i pasted the db-config.xml below for
reference - check

i restarted jboss - thus restarting solr - check

i went in to the solr dataimport admin screen and did a clean import - check

after the import was complete - i queried a part that i knew would have one
of the clob fields - results are pasted below as well - you can see the
binary address in the field.


?xml version=1.0?
result name=response numFound=1 start=0
  doc
str name=accessoryIndicatorN/str
 *   str name=attributes[B@5b372219/str*
str name=availabilityStatusPIA/str
arr name=divProductTypeDesc
  strRefrigerators and Freezers/str
/arr
str name=divProductTypeId0046/str
str name=id12001892,0046,464/str
str name=itemModelDescVALVE, WATER/str
str name=itemModelNo12001892/str
str name=itemModelNoExactMatchStr12001892/str
int name=itemType1/int
str name=otcStockIndicatorY/str
int name=partCnt1/int
str name=partConditionN/str
arr name=plsBrandDesc
  str/
/arr
str name=plsBrandId464/str
str name=productIndicatorN/str
int name=rankNo13/int
float name=sellingPrice53.54/float
str name=sourceOrderNo464 /str
str name=subbedFlagY/str
  /doc
/result








document
entity transformer=TemplateTransformer name=core1-parts
query=select 
summ.*, 
1 as item_type, 
1 as part_cnt, 
'' as brand, 
mst.acy_prt_fl,
mst.dil_tx,
mst.hzd_mtl_typ_cd,
mst.otc_cre_stk_fl,
mst.prd_fl,
mst.prt_cmt_tx,
mst.prt_cnd_cd,
mst.prt_inc_qt,
mst.prt_made_by,
mst.sug_qt,
att.attr_val,
rsr.rsr_val,
case when sub.orb_itm_id is null then 'N' else 'Y' end as
subbed_flag
from 
prtxtps_prt_summ as summ
left outer join prtxtpm_prt_mast as mst on mst.orb_itm_id =
summ.orb_itm_id and mst.prd_gro_id = summ.prd_gro_id and mst.spp_id =
summ.spp_id
left outer join tmpxtpa_prt_attr as att on att.orb_itm_id =
summ.orb_itm_id and att.prd_gro_id = summ.prd_gro_id and att.spp_id =
summ.spp_id 
left outer join tmpxtpr_prt_rsr as rsr on rsr.orb_itm_id =
summ.orb_itm_id and rsr.prd_gro_id = summ.prd_gro_id and rsr.spp_id =
summ.spp_id 
left outer join tmpxtps_prt_sub as sub on sub.orb_itm_id =
summ.orb_itm_id and sub.prd_gro_id = summ.prd_gro_id and sub.spp_id =
summ.spp_id
where 
summ.spp_id = '464' 

field column =id name=id 
template=${core1-parts.orb_itm_id},${core1-parts.prd_gro_id},${core1-parts.spp_id}/
field column=orb_itm_id name=itemModelNo/ 
 
field column=prd_gro_id
name=divProductTypeId/ 
field column=ds_tx 
name=itemModelDesc/ 
field column=spp_id name=plsBrandId/ 
field column=rnk_no name=rankNo/ 
field column=item_type  name=itemType/ 
field column=brand  name=plsBrandDesc/ 
field column=prd_gro_ds
name=divProductTypeDesc/ 
field column=part_cnt   name=partCnt/ 
field column=avail 
name=availabilityStatus/ 
field column=price  name=sellingPrice/ 
field column=prt_son   
name=sourceOrderNo/ 
field column=prt_src_cd name=sourceIdCode/ 
field column=rte_cd
name=sourceRouteCode/ 

field column=acy_prt_fl
name=accessoryIndicator/ 
field column=dil_tx name=disclosure/ 
field column=hzd_mtl_typ_cd
name=hazardousMaterialCode/ 
field column=otc_cre_stk_fl
name=otcStockIndicator/ 
field column=prd_fl
name=productIndicator/ 
field column=prt_cmt_tx name=comment/ 
field column=prt_cnd_cd
name=partCondition/ 
field column=prt_inc_qt name=qtyIncluded/ 
field column=prt_made_byname=madeBy/ 
field column=sug_qt name=suggestedQty/ 

field column=attr_val   name=attributes/ 
field column=rsr_valname=restrictions/ 

field column=subbed_flag

Re: having trouble storing large text blob fields - returns binary address in search results

2013-05-17 Thread geeky2
Hello Gora,


thank you for the reply - 

i did finally get this to work.  i had to cast the column in the DIH to a
clob - like this.

cast(att.attr_val AS clob) as attr_val,
cast(rsr.rsr_val AS clob) as rsr_val,

once this was done, the ClobTransformer worked.

to my knowledge - this particular use case and the need for the cast is not
documented anywhere.  i checked the solr wiki and searched the threads on
this forum for things like clobtransformer, informix and blob without luck. 
i also did quite a few google searches as well but no luck (but maybe i
missed something ;)

maybe this is just some edge case.  i also realize that informix is not
that common.

i have a question in to the solr developers list - just so i can better
understand what actually is happening, why it was necessary for the cast,
and the limitations / parameters of the ClobTransformer.  

the thread on the developers list is located here:

http://lucene.472066.n3.nabble.com/have-developer-question-about-ClobTransformer-and-DIH-td4064256.html

thx
mark






--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979p4064286.html
Sent from the Solr - User mailing list archive at Nabble.com.


having trouble storing large text blob fields - returns binary address in search results

2013-05-16 Thread geeky2
hello 

environment: solr 3.5

can someone help me with the correct configuration for some large text blob
fields?

we have two fields in informix tables that are of type text. 

when we do a search the results for these fields come back looking like
this: 

str name=attributes[B@17c232ee/str

i have tried setting them up as clob fields - but this is not working (see
details below)

i have also tried treating them as plain string fields (removing the
references to clob in the DIH) - but this does not work either.


DIH configuration:


  entity transformer=quot;TemplateTransformer,ClobTransformerquot;
name=quot;core1-partsquot; query=quot;select 
summ.*, 
1 as item_type, 
1 as part_cnt, 
'' as brand, 
...

 lt;field column=quot;attr_valquot;   name=quot;attributesquot;
clob=quot;truequot; /
field column=rsr_valname=restrictions clob=true
/


Schema.xml

  field name=attributes type=string indexed=false stored=true/
field name=restrictions type=string indexed=false stored=true/

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: why does * affect case sensitivity of query results

2013-04-30 Thread geeky2
hello erik,

thank you for the info - yes - i did notice ;)

one more reason for us to upgrade from 3.5.

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p406.html
Sent from the Solr - User mailing list archive at Nabble.com.


why does * affect case sensitivity of query results

2013-04-29 Thread geeky2
hello,

environment: solr 3.5


problem statement: when query has * appended, it turns case sensitive.

assumption: query should NOT be case sensitive

actual value in database at time of index: 4387828BULK

here is a snapshot of what works and does not work.

what works:

  itemModelNoExactMatchStr:4387828bULk (and any variation of upper and lower
case letters for *bulk*)

  itemModelNoExactMatchStr:4387828bu*
  itemModelNoExactMatchStr:4387828bul*
  itemModelNoExactMatchStr:4387828bulk*


what does NOT work:

 itemModelNoExactMatchStr:4387828BU*
 itemModelNoExactMatchStr:4387828BUL*
 itemModelNoExactMatchStr:4387828BULK*


below are the specifics of my field and fieldType

  field name=itemModelNoExactMatchStr type=text_exact indexed=true
stored=true/


fieldType name=text_exact class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: why does * affect case sensitivity of query results

2013-04-29 Thread geeky2
was looking in Smiley's book on page 129 and 130.

from the book,


No text analysis is performed on the search word containing the wildcard,
not even lowercasing. So if you want to find a word starting with Sma, then
sma* is required instead of Sma*, assuming the index side of the field's
type
includes lowercasing. This shortcoming is tracked on SOLR-219. Moreover,
if the field that you want to use the wildcard query on is stemmed in the
analysis, then smashing* would not find the original text Smashing because
the stemming process transforms this to smash. Consequently, don't stem.


thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: why does * affect case sensitivity of query results

2013-04-29 Thread geeky2
here is the jira link:

https://issues.apache.org/jira/browse/SOLR-219





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059814.html
Sent from the Solr - User mailing list archive at Nabble.com.


having trouble searching on EdgeNGramFilterFactory field with a length minGramSize

2013-03-19 Thread geeky2
hello,

i am trying to debug the following query in the analyzer:

*+itemModelNoExactMatchStr:JVM1640CJ01 +plsBrandId:0432 +plsBrandDesc:ge*

the query is going against a field (plsBrandDesc) that is being indexed with 
solr.EdgeNGramFilterFactory and a  minGramSize of 3.  i have included the
complete field definition below.

after doing some experimenting in the analyzer, i believe the query may be
failing because the queried value of ge is only two (2) characters long -
and the minimum gram size is three (3) characters.

for example - this query does work in the analyzer.  it has a plsBrandDesc 
three characters and does return exactly one document:

+itemModelNoExactMatchStr:404 +plsBrandId:0431 *+plsBrandDesc:general*


i have tried overriding this behavior by using mm=2, but this does not seem
to work:

+itemModelNoExactMatchStr:JVM1640CJ01 +plsBrandId:0432 +plsBrandDesc:ge mm=2

am i misunderstanding how mm works - or am i getting the syntax for mm
incorrect?

thx
mark




field name=plsBrandDesc type=text_general_edge_ngram indexed=true
stored=true multiValued=true/



fieldType name=text_general_edge_ngram class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms_SHC.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-searching-on-EdgeNGramFilterFactory-field-with-a-length-minGramSize-tp4049107.html
Sent from the Solr - User mailing list archive at Nabble.com.


need general advice on how others version and mange core deployments over time

2013-03-14 Thread geeky2
hello everyone,

i know this is a general topic - but would really appreciate info from
others that are doing this now.

  - how are others managing this so that users are impacted the least 
  - how are others handling the scenario where users don't want to migrate
forward.

thx
mark






--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-general-advice-on-how-others-version-and-mange-core-deployments-over-time-tp4047390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about syntax for multiple terms in filter query

2013-03-12 Thread geeky2
hello jack,

yes - i will always be using the two constraints at the same time.

thank you again for the info.

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046650.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about syntax for multiple terms in filter query

2013-03-12 Thread geeky2
jack,

did you mean function query or filter query

i was going to do this in my request handler for parts

   str name=fq+itemType:1 +sellingPrice:[1 TO *]/str 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046715.html
Sent from the Solr - User mailing list archive at Nabble.com.


having trouble escaping a character string

2013-03-12 Thread geeky2
hello all,

i am searching on this field type:

field name=itemModelNoExactMatchStr type=text_exact indexed=true
stored=true/


fieldType name=text_exact class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

for this string: 30326R-26 TILLER

when i use the analyzer and issue the query - it indicates success (please
see attached screen shot)

but when i issue the search url - it does not return a document

http://bogus/solrpartscat/core2/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%2230326R-26%22%20TILLER%22

can someone tell me what i am missing?

thx
mark


http://lucene.472066.n3.nabble.com/file/n4046796/temp1.bmp 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: having trouble escaping a character string

2013-03-12 Thread geeky2
attempting to upload the screenshot bmp file.  the embedded image is
difficult to make out.

temp1.bmp http://lucene.472066.n3.nabble.com/file/n4046798/temp1.bmp  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796p4046798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: having trouble escaping a character string

2013-03-12 Thread geeky2
oh - 

now i see what i was doing wrong.


i kept trying to use the hex code of %22 as a replacement for the double
quote - but that was not working - 

thank you jack,

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796p4046821.html
Sent from the Solr - User mailing list archive at Nabble.com.


question about syntax for multiple terms in filter query

2013-03-11 Thread geeky2
hello everyone,

i have a question on the filter query syntax for multiple terms, after
reading this:

http://wiki.apache.org/solr/CommonQueryParameters#fq

i see from the above that two (2) syntax constructs are supported

fq=term1:foo  fq=term2:bar

and

fq=+term1:foo +term2:bar

is there a reason why i would want to use one syntax over the other?

does the first syntax support the and operand as well as the ?

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about syntax for multiple terms in filter query

2013-03-11 Thread geeky2
otis and jack - 

thank you VERY much for the feedback - 

jack - 


use a single fq containing two mandatory
clauses if those clauses appear together often


this is the use case i  have to account for - eg, 

right now i have this in my request handler

 requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
  ...
  str name=fqitemType:1/str
  ...
 /requestHandler

which says - i only want parts 

but i need to augment the filter so only parts that have a price = 1.0 are
returned from the request handler

so i believe i need to have this in the RH
 requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
  ...
  str name=fq+itemType:1 +sellingPrice:[1 TO *]/str
  ...
 /requestHandler

thx
mark







--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046548.html
Sent from the Solr - User mailing list archive at Nabble.com.


searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread geeky2
hello

environment: solr 3.5

problem statement:

i have a requirement to search for part numbers that start with a dash /
hyphen.

example q= term: *-0004A-0436*

example query:

http://some_url:some_port/some_core/select?facet=falsesort=score+desc%2C+rankNo+asc%2C+partCnt+descstart=0q=*-0004A-0436*+itemType%3A1wt=xmlqt=itemModelNoProductTypeBrandSearchrows=4

what is happening: query is returning a huge results set.  in reality there
is one (1) and only one record in the database with this part number.

i believe this is happening because the dash is being interpreted by the
query parser as a prohibited clause and the effective result is, give me
everything that does NOT have this part number.

how is this handled so that the search is conducted for the actual part:
-0004A-0436

thx
mark

more information:

request handler in solrconfig.xml

  requestHandler name=itemModelNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemModelNoExactMatchStr^30 itemModelNo^.9
divProductTypeDesc^.8 plsBrandDesc^.5/str
  str name=q.alt*:*/str
  str name=sortscore desc, rankNo desc, partCnt desc/str
  str name=facettrue/str
  str name=facet.fielditemModelDescFacet/str
  str name=facet.fieldplsBrandDescFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler


field information from schema.xml (if helpful)

field name=itemModelNoExactMatchStr type=text_general_trim
indexed=true stored=true/
 
field name=itemModelNo type=text_en_splitting indexed=true
stored=true omitNorms=true/

field name=divProductTypeDesc type=text_general_edge_ngram
indexed=true stored=true multiValued=true/

field name=plsBrandDesc type=text_general_edge_ngram indexed=true
stored=true multiValued=true/


fieldType name=text_general_trim class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/


filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory pattern=\.
replacement= replace=all/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer

fieldType name=text_general_edge_ngram class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms_SHC.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType






--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
Sent from the Solr - User mailing list archive at Nabble.com.


performing a boolean query (OR) with a large number of terms

2013-01-09 Thread geeky2
hello,

environment: solr 3.5

i have a requirement to perform a boolean query (like the example below)
with a large number of terms.

the number of terms could be 15 or possibly larger.

after looking over several theads and the smiley book - i think i just have
include the parens and string all of the terms together with OR's

i just want to make sure that i am not missing anything.

is there a better or more efficient way of doing this?

http://server:port/dir/core1/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%285-100-NGRT7%20OR%205-10-10MS7%20OR%20404%29rows=30debugQuery=onrows=40


thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/performing-a-boolean-query-OR-with-a-large-number-of-terms-tp4032039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is it possible to save the search query?

2012-11-20 Thread geeky2
Hello,

i think you are asking two questions here - i'll see if i can give you some
simple examples for both

1) how can i pull data from a solr search result set and compare it to
another for analysis?

one way - might be to drive the results in to files and then use xslt to
extract relevant information.

here is an example xslt file that pulls specific fields from a result:

?xml version=1.0 encoding=ISO-8859-1?
xsl:stylesheet version=1.0
xmlns:xsl=http://www.w3.org/1999/XSL/Transform;
xsl:output method=text/
xsl:template match=response/result
xsl:for-each select=doc
xsl:text[/xsl:text
xsl:value-of select=str[@name='itemNo']/
xsl:text]/xsl:text
xsl:text,/xsl:text
xsl:value-of select=float[@name='score']/
xsl:text,/xsl:text
xsl:value-of select=int[@name='rankNo']/
xsl:text,/xsl:text
xsl:value-of select=int[@name='partCnt']/
xsl:text#10;/xsl:text
/xsl:for-each
/xsl:template
/xsl:stylesheet 



2) how can i embed data in to a solr query, making it easier to do analysis
in the log files?

here is a simple example that bookmarks or brackets transactions in the
logs - used only during stress testing

#!/bin/bash

TYPE=$1
TAG=$2

if [ $TYPE == 1 ]
then
# beginning
curl -v
http://something:1234/boo/core1/select/?q=partImageURL%3A${TAG}-test-beginversion=2.2start=0rows=777indent=on
else
# end
curl -v
http://something:1234/boo/core1/select/?q=partImageURL%3A${TAG}-test-endversion=2.2start=0rows=777indent=on
fi


hopefully this will give you something to start with.

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-save-the-search-query-tp4018925p4021315.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2
Hello Andy,

i had a similar question on this some time ago.

http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html#a3987123

http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-td3801327.html#a3803658

i ended up writing my own shell based polling application that runs from our
*nx batch server that handles all of our Control-M work.  

+1 on the idea of making this a more formal part of the API.

let me know if you want concrete example code.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021148.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2
James,

was it you (cannot remember) that replied to one of my queries on this
subject and mentioned that there was consideration being given to cleaning
up the response codes to remove ambiguity?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: large text blobs in string field

2012-11-05 Thread geeky2
Gora,

currently our core does use mult-valued fields.  however the exsiting
multi-valued fields in the schema are will only result in 3 - 10 values.

we are thinking of using the text blob approach primarily because of the
large number of possible values in this field.  

if we were to use a multi-valued field, it is likely that the MV field would
have 200+ values and in some edge cases 400+ values.

are you saying that the MV field approach to represent the data (given the
scale previously indicated) is the best design solution?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018315.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: large text blobs in string field

2012-11-05 Thread geeky2
Erick,

thanks for the insight.

FWIW and to add to the context of this discussion,

if we do decide to add the previously mentioned content as a multivalued
field,  we would likely use a DIH hooked to our database schema (this is
currently how we add ALL content to our core) and within the DIH, use a
sub-entity to pull the many rows for each parent row.

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018355.html
Sent from the Solr - User mailing list archive at Nabble.com.


large text blobs in string field

2012-11-02 Thread geeky2
hello 

environment - solr 3.5

i would like to know if anyone is using the technique of placing large text
blobs in to a non-indexed string field and if so - are there any good/bad
aspects to consider?

we are thinking of doing this to represent a 1:M relationship with the
Many being represented as a string in the schema (probably comprised
either of xml or json objects).

we are looking at the classic part : model scenario, where the client would
look up a part and the document would contain a string field with
potentially 200+ model numbers.  edge cases for this could be 400+ model
numbers.

thx

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help with exact match search

2012-10-22 Thread geeky2
hello jack,

that was it!

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4015103.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help with exact match search

2012-10-19 Thread geeky2
hello jack,

thank you very much for the reply - i will re-test and let you know.

really appreciate it ;)

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4014848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-28 Thread geeky2
Chris, Jack,

thank you for the detailed replies and help ;)






--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4003782.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
update:

as an experiment - i changed the query to a wildcard (9030*) instead of an
explicit value (9030)

example:

QUERY=http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030*rows=2000debugQuery=onfl=*,score;

this resulted in a results list that appears much more rational from a sort
order perspective -

however - the wildcard query is not acceptable from a performance stand
point.

any input or illumination would be appreciated ;)

thank you

itemNo, score, rankNo, partCnt

  [9030],1.0,10353,1
[90302   ],1.0,6849,1
[9030P   ],1.0,444,1
[903093  ],1.0,51,1
[9030430 ],1.0,47,1
[9030],1.0,37,1
[903057-9010 ],1.0,26,1
[903061-9010 ],1.0,20,1
[903046-9010 ],1.0,18,1
[903056-9010 ],1.0,14,1
[903095  ],1.0,14,1
[90303-MR1-000   ],1.0,14,1
[903097-9050 ],1.0,12,1
[903046-9011 ],1.0,12,1
[903097-9010 ],1.0,11,1
[903097-9040 ],1.0,11,1
[903063-9100 ],1.0,6,1
[903066-9011 ],1.0,6,1
[903098  ],1.0,3,1




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
looks like the original complete list of the results did not get attached to
this thread 

here is a snippet of the list.

what i am trying to demonstrate, is the difference in scoring and
ultimately, sorting - and the breadth of documents (a few hundred) between
the two documents of interest (9030 and 90302)

thank you,

itemNo, score, rankNo, partCnt

  [9030],12.014701,10353,1
[9030],12.014701,37,1
[9030],12.014701,1,1
[9030   ],12.014701,0,167
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[PC-9030],7.509188,0,169
[58-9030 ],7.509188,0,1
[9030-1R ],7.509188,0,1
[903028-9030 ],7.509188,0,1
[903139-9030 ],7.509188,0,1
[903091-9030 ],7.509188,0,1
[903099-9030 ],7.509188,0,1
[903153-9030 ],7.509188,0,1
[031-9030],7.509188,0,1
[308-9030],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6001   ],7.509188,0,1
[9030-6003   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[208568-9030 ],7.509188,0,1
[79-9030 ],7.509188,0,1
[33-9030 ],7.509188,0,1
[M-9030  ],7.509188,0,1

... a few hundred more ...

[LGQ9030PQ1 ],0.41475832,0,150
[LEQ9030PQ0 ],0.41475832,0,124
[LEQ9030PQ1 ],0.41475832,0,123
[CWE9030BCE ],0.41475832,0,115
[PJDS9030Z   ],0.29327843,0,1
[8A-CT9-030-010  ],0.29327843,0,1
[RDT9030A],0.29327843,0,1
[PJDG9030Z   ],0.29327843,0,1
[90302   ],0.20737916,6849,1
~   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Holy cow do I love 4.0's admin screen

2012-08-23 Thread geeky2
Andy,

we are not running solr 4.0 here in production.

can you elaborate on your comment related to your polling script written in
ruby and how the new data import status screen makes your polling app
obsolete?

i wrote my own polling app (in shell) to work around the very same issues:

http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html

thx for the post



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Holy-cow-do-I-love-4-0-s-admin-screen-tp4002912p4002936.html
Sent from the Solr - User mailing list archive at Nabble.com.


using tie parameter of edismax to raise a score (disjunction max query)?

2012-08-23 Thread geeky2

Hello all,

this more specific question is related to my earlier post at:
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-td4002897.html

i am reading here about the tie parameter:
http://wiki.apache.org/solr/ExtendedDisMax?highlight=%28edismax%29#tie_.28Tie_breaker.29

*can i use the edismax, tie= parameter, to raise the following score?*

my goal is to raise the total score of this document (see score snippet
below) to 9.11329.

to do this - would i use tie=0.0 to make a pure disjunction max query --
only the maximum scoring sub query contributes to the final score?


  str name=90302   ,0046,046
*0.20737723* = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
*9.11329* = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=1796597)
/str

thank you








--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-tie-parameter-of-edismax-to-raise-a-score-disjunction-max-query-tp4002935.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,

i am trying to understand the debug output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.  

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

  str name=sortscore desc, rankNo desc, partCnt desc/str



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

  str name=9030,0046,046
12.014634 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=2308681)
  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
12.014634 = idf(docFreq=140, maxDocs=8566704)
1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)
/str




  str name=90302   ,0046,046
0.20737723 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=1796597)
/str

~  

  requestHandler name=itemNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5/str
  str name=q.alt*:*/str
  str name=sortscore desc, rankNo desc, partCnt desc/str
  str name=facettrue/str
  str name=facet.fielditemDescFacet/str
  str name=facet.fieldbrandFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler
 
thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,


this is the query i am using:

 cat goquery.sh
#!/bin/bash

SERVER=$1
PORT=$2


QUERY=http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030rows=2000debugQuery=onfl=*,score;

curl -v $QUERY




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help understanding times used in dataimport?command=status

2012-07-11 Thread geeky2
hello all,

i noticed something in one of our logs that periodically polls the status of
an data import.

can someone help me understand where / how the times for Full Dump
Started are derived?

here it shows the dataimport dump starting at 1:32


?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
  /lst
  lst name=initArgs
lst name=defaults
  str name=configdb-data-config.xml/str
/lst
  /lst
  str name=commandstatus/str
  str name=statusbusy/str
  str name=importResponseA command is still running.../str
  lst name=statusMessages
str name=Time Elapsed0:0:8.182/str
str name=Total Requests made to DataSource2/str
str name=Total Rows Fetched18834/str
str name=Total Documents Processed18818/str
str name=Total Documents Skipped0/str
*str name=Full Dump Started2012-07-11 01:32:18/str*
  /lst
  str name=WARNINGThis response format is experimental. It is likely to
change in the future./str
/response



however - here is shows the dump starting at 2:17

?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
  /lst
  lst name=initArgs
lst name=defaults
  str name=configdb-data-config.xml/str
/lst
  /lst
  str name=commandstatus/str
  str name=statusbusy/str
  str name=importResponseA command is still running.../str
  lst name=statusMessages
str name=Time Elapsed0:45:8.373/str
str name=Total Requests made to DataSource3/str
str name=Total Rows Fetched8138060/str
str name=Total Documents Skipped0/str
*str name=Full Dump Started2012-07-11 02:17:11/str*
  /lst
  str name=WARNINGThis response format is experimental. It is likely to
change in the future./str
/response




?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
  /lst
  lst name=initArgs
lst name=defaults
  str name=configdb-data-config.xml/str
/lst
  /lst
  str name=commandstatus/str
  str name=statusidle/str
  str name=importResponse/
  lst name=statusMessages
str name=Total Requests made to DataSource3/str
str name=Total Rows Fetched8528239/str
str name=Total Documents Skipped0/str
*str name=Full Dump Started2012-07-11 02:17:11/str*
str name=Indexing completed. Added/Updated: 8464051 documents.
Deleted 0 documents./str
*str name=Committed2012-07-11 02:21:17/str*
*str name=Optimized2012-07-11 02:21:17/str*
str name=Total Documents Processed8464051/str
*str name=Time taken 0:48:58.712/str*
  /lst
  str name=WARNINGThis response format is experimental. It is likely to
change in the future./str
/response



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-times-used-in-dataimport-command-status-tp3994437.html
Sent from the Solr - User mailing list archive at Nabble.com.


maxNumberOfBackups does not cleanup - jira 3361

2012-07-10 Thread geeky2

environment: solr 3.5

hello all,

i have a question on this jira -
https://issues.apache.org/jira/browse/SOLR-3361

the jira states that, with backupAfter=commit, the backups do not get
cleaned up

however - we are noticing this same issue in our environment, when using
optimize.

can someone confirm that this bug applies to optimze as well?

thank you



example:


  str name=backupAfteroptimize/str


--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxNumberOfBackups-does-not-cleanup-jira-3361-tp3994156.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: maxNumberOfBackups does not cleanup - jira 3361

2012-07-10 Thread geeky2
thank you James - that is good to know.

for the short-term we'll just use cron and kill backup directories that are
older than x.

for the long-term, we'll just migrate to 4.0

thanks again


--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxNumberOfBackups-does-not-cleanup-jira-3361-tp3994156p3994191.html
Sent from the Solr - User mailing list archive at Nabble.com.


avgTimePerRequest JMX M-Bean displays with NaN instead of 0 - when no activity

2012-06-28 Thread geeky2
hello all,

environment: solr 3.5, jboss, wily

we have been setting up jmx monitoring for our solr installation.

while running tests - i noticed that of the 6 JMX M-Beans
(avgRequestsPerSecond, avgTimePerRequest, errors, requests, timeouts,
totalTime) ...

the avgTimePerRequest M-Bean was producing NaN when there was no search
activity.

all of the other M-Beans displayed a 0 (zero) when there was no search
activity.

we were able to compensate for this issue with custom scripting in wily on
our side.

can someone help me understand this inconsistency?

is this just a WAD (works as a designed) ?

thanks for any help or insight



--
View this message in context: 
http://lucene.472066.n3.nabble.com/avgTimePerRequest-JMX-M-Bean-displays-with-NaN-instead-of-0-when-no-activity-tp3991962.html
Sent from the Solr - User mailing list archive at Nabble.com.


question about jmx value (avgRequestsPerSecond) output from solr

2012-06-27 Thread geeky2
hello all,

environment: centOS, solr 3.5, jboss 5.1

i have been using wily (a monitoring tool) to instrument our solr instances
in stress.

can someone help me to understand something about the jmx values being
output from solr?  please note - i am new to JMX.

problem / issue statement: for a given request handler (partItemDescSearch),
i see output from the jmx MBean for the metric avgRequestsPerSecond - AFTER
my test harness has completed and there is NO request activity to this
request handler - taking place (verified in solr log files).

example scenario during testing:  during a test run - the test harness will
fire requests at request handler (partItemDescSearch) and all numbers look
fine.   then after the test harness is done - the metric
avgRequestsPerSecond does not immediately drop to 0.  instead - it appears
as if JMX is somehow averaging this metric and gradually trending it
downward toward 0.

continual checking of this metric (in the JMX tree - see screen shot) shows
the number trending downward instead of a hard stop at 0.

is this behavior - just the way jmx works?

thanks mark

http://lucene.472066.n3.nabble.com/file/n3991616/test1.bmp 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-jmx-value-avgRequestsPerSecond-output-from-solr-tp3991616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: seeing errors during replication process on slave boxes - read past EOF

2012-06-04 Thread geeky2
hello,

i have shell scripts that handle all of the operational tasks.  

example:

curl -v http://${SERVER}.bogus.com:${PORT}/somecore/dataimport -F
command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
optimize=${OPTIMIZE}


--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-errors-during-replication-process-on-slave-boxes-read-past-EOF-tp3987489p3987617.html
Sent from the Solr - User mailing list archive at Nabble.com.


seeing errors during replication process on slave boxes - read past EOF

2012-06-03 Thread geeky2
hello all,

environment: solr 3.5

1 - master
2 - slave

slaves are set to poll master every 10 minutes.

i have had replication running on one master and two slaves - for a few
weeks now.  these boxes are not production boxes - just QA/test boxes.

right after i started a re-index on the master - i started to see the
following errors on both of the slave boxes.

in previous test runs - i have not noticed any errors.

can someone help me understand what is causing these errors?

thank you,

2012-06-03 19:30:23,104 INFO  [org.apache.solr.update.UpdateHandler]
(pool-16-thread-1) start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
2012-06-03 19:30:23,164 SEVERE [org.apache.solr.handler.ReplicationHandler]
(pool-16-thread-1) SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:268)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF:
MMapIndexInput(path=/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:470)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:321)
... 11 more
Caused by: java.io.IOException: read past EOF:
MMapIndexInput(path=/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(MMapDirectory.java:279)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.readInt(MMapDirectory.java:315)
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:138)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:212)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93)
at
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:235)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:34)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:506)
at
org.apache.lucene.index.DirectoryReader.access$000(DirectoryReader.java:45)
at
org.apache.lucene.index.DirectoryReader$2.doBody(DirectoryReader.java:498)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754)
at
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:493)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:396)
at
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:520)
at org.apache.lucene.index.IndexReader.reopen(IndexReader.java:697)
at
org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:414)
at
org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:425)
at
org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:35)
at
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:501)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1083)
... 14 more
2012-06-03 19:30:23,197 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Skipping download for
/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kiq.tis
2012-06-03 19:30:23,198 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Skipping download for
/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kit.tis
2012-06-03 19:30:23,198 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Skipping download for
/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdt
2012-06-03 19:30:23,198 INFO  

eliminate adminPath tag from solr.xml file?

2012-06-01 Thread geeky2
hello all,

referring to:

http://wiki.apache.org/solr/CoreAdmin#Core_Administration

if you wanted to eliminate administration of the core from the web site,

could you eliminate either solr.xml or remove the 

cores adminPath=/admin/cores from the solr.xml file?

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/eliminate-adminPath-tag-from-solr-xml-file-tp3987262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: possible status codes from solr during a (DIH) data import process

2012-06-01 Thread geeky2
thank you ALL for the great feedback - very much appreciated!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110p3987263.html
Sent from the Solr - User mailing list archive at Nabble.com.


possible status codes from solr during a (DIH) data import process

2012-05-31 Thread geeky2
hello all,

i have been asked to write a small polling script (bash) to periodically
check the status of an import on our Master.  our import times are small,
but there are business reasons why we want to know the status of an import
after a specified amount of time.

i need to perform certain actions based on the status of the import, and
therefore need to quantify which tags to check and their appropriate states.

i am using the command from the DataImportHandler HTTP API to get the status
of the import:

OUTPUT=$(curl -v
http://${SERVER}:${PORT}/somecore/dataimport?command=status)




can someone tell me if i have these rules correct?

1) during an import - the status tag will have a busy state:

example:

  str name=statusbusy/str

2) at the completion of an import (regardless of failure or success) the
status tag will have an idle state:

example:

  str name=statusidle/str


3) to determine if an import failed or succeeded - you must interrogate the
tags under   lst name=statusMessages and specifically look for :

success: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

failure: 
str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents./str

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
Sent from the Solr - User mailing list archive at Nabble.com.


need to verify my understanding of default value of mm (minimum match) for edismax

2012-05-24 Thread geeky2
environment: solr 3.5
default operator is OR

i want to make sure i understand how the mm param(minimum match) works for
the edismax parser

http://wiki.apache.org/solr/ExtendedDisMax?highlight=%28dismax%29#mm_.28Minimum_.27Should.27_Match.29

it looks like the rule is 100% of the terms must match across the fields,
unless i over ride this with the mm=x param - do i have this right?

what i am seeing is a query that matches on:

q=singer sewing 9010

will fail if it is changed to:

q=singer sewing machine 9010

for the second query - if i add mm=3 - then it comes back with results

thank you


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-to-verify-my-understanding-of-default-value-of-mm-minimum-match-for-edismax-tp3985936.html
Sent from the Solr - User mailing list archive at Nabble.com.


index-time boosting using DIH

2012-05-22 Thread geeky2
hello all,

can i use the technique described on the wiki at:

http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts

if i am populating my core using a DIH?

looking at the posts on this subject and the wiki docs - leads me to believe
that you can only use this when you are using the xml interface for
importing data?

thank you

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thanks for the reply,

so to use the $docBoost pseudo-field name, would you do something like below
- and would this technique likely increase my total index time?



dataConfig
  dataSource .../

 
  document name=mydoc
entity name=myentity
transformer=script:BoostDoc
query=select ...
 
 field column=SOME_COLUMN name=someField /
 ... 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thank you james for the feedback - i appreciate it.

ultimately - i was trying to decide if i was missing the boat by ONLY using
query time boosting, and i should really be using index time boosting.

but after your reply, reading the solr book, and looking at the lucene dox -
it looks like index-time boosting is not what i need.  i can probably do
better by using query-time boosting and the proper sort params.

thanks again

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985539.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help with getting exact matches to score higher

2012-05-15 Thread geeky2
Hello all,


i am trying to tune our core for exact matches on a single field (itemNo)
and having issues getting it to work.  

in addition - i need help understanding the output from debugQuery=on where
it presents the scoring.

my goal is to get exact matches to arrive at the top of the results. 
however - what i am seeing is non-exact matches arrive at the top of the
results with MUCH higher scores.



// from schema.xml - i am copying itemNo in to the string field for use in
boosting

  field name=itemNoExactMatchStr  type=string indexed=true
stored=false/
  copyField source=itemNo dest=itemNoExactMatchStr/

// from solrconfig.xml - i have the boost set for my special exact match
field and the sorting on score desc.

  requestHandler name=itemNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  *str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5/str*
  str name=q.alt*:*/str
 * str name=sortscore desc/str*
  str name=facettrue/str
  str name=facet.fielditemDescFacet/str
  str name=facet.fieldbrandFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler



// analysis output from debugQuery=on

here you can see that the top socre for itemNo:9030 is a part that does not
start with 9030.

the entries below (there are 4) all have exact matches - but they rank below
this part - ???



str name=quot;0904000,1354  ,lt;b2TTZ9030C1000A* 
0.585678 = (MATCH) max of:
  0.585678 = (MATCH) weight(itemNo:9030^0.9 in 582979), product of:
0.021552926 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  10.270785 = idf(docFreq=55, maxDocs=594893)
  0.0023316324 = queryNorm
27.173943 = (MATCH) fieldWeight(itemNo:9030 in 582979), product of:
  2.6457512 = tf(termFreq(itemNo:9030)=7)
  10.270785 = idf(docFreq=55, maxDocs=594893)
  1.0 = fieldNorm(field=itemNo, doc=582979)
/str



str name=quot;122,1232  ,lt;b9030*   
0.22136548 = (MATCH) max of:
  0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 499864), product of:
0.021552926 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  10.270785 = idf(docFreq=55, maxDocs=594893)
  0.0023316324 = queryNorm
10.270785 = (MATCH) fieldWeight(itemNo:9030 in 499864), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  10.270785 = idf(docFreq=55, maxDocs=594893)
  1.0 = fieldNorm(field=itemNo, doc=499864)
/str

str name=quot;0537220,1882  ,lt;b9030   *
0.22136548 = (MATCH) max of:
  0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 538826), product of:
0.021552926 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  10.270785 = idf(docFreq=55, maxDocs=594893)
  0.0023316324 = queryNorm
10.270785 = (MATCH) fieldWeight(itemNo:9030 in 538826), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  10.270785 = idf(docFreq=55, maxDocs=594893)
  1.0 = fieldNorm(field=itemNo, doc=538826)
/str

str name=quot;0537220,2123  ,lt;b9030   *
0.22136548 = (MATCH) max of:
  0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544313), product of:
0.021552926 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  10.270785 = idf(docFreq=55, maxDocs=594893)
  0.0023316324 = queryNorm
10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544313), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  10.270785 = idf(docFreq=55, maxDocs=594893)
  1.0 = fieldNorm(field=itemNo, doc=544313)
/str

str name=quot;0537220,2087  ,lt;b9030   *
0.22136548 = (MATCH) max of:
  0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544657), product of:
0.021552926 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  10.270785 = idf(docFreq=55, maxDocs=594893)
  0.0023316324 = queryNorm
10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544657), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  10.270785 = idf(docFreq=55, maxDocs=594893)
  1.0 = fieldNorm(field=itemNo, doc=544657)
/str







--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-with-getting-exact-matches-to-score-higher-tp3983882.html
Sent from the Solr - User mailing list archive at Nabble.com.


doing a full-import after deleting records in the database - maxDocs

2012-05-15 Thread geeky2

hello,

After doing a DIH full-import (with clean=true) after deleting records in
the database, i noticed that the number of documents processed, did change.


example:

Indexing completed. Added/Updated: 595908 documents. Deleted 0 documents.

however, i noticed the numbers on the statistics page did not change nor do
they match the number of indexed records -


can someone help me understand the difference in these numbers and the
meaning of maxDoc / numDoc?

numDocs : 594893
maxDoc : 594893 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: doing a full-import after deleting records in the database - maxDocs

2012-05-15 Thread geeky2
hello 

thanks for the reply

this is the output - docsPending = 0

commits : 1786
autocommit maxDocs : 1000
autocommit maxTime : 6ms
autocommits : 1786
optimizes : 3
rollbacks : 0
expungeDeletes : 0
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 1787752
cumulative_deletesById : 0
cumulative_deletesByQuery : 3
cumulative_errors : 0 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948p3983995.html
Sent from the Solr - User mailing list archive at Nabble.com.


not getting expected results when doing a delta import via full import

2012-05-14 Thread geeky2
hello all,


i am not getting the expected results when trying to set up delta imports
according to the wiki documentation here:

http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport?highlight=%28delta%29|%28import%29



i have the following set up in my DIH,

query=select [complicated sql goes here] and
('${dataimporter.request.clean}' != 'false' OR some_table.upd_by_ts  
'${dataimporter.last_index_time}') 

i have the following set up in the shell script to invoke my import process
(either a full w/clean or delta)

# change clean=true for full, clean=false for delta

SERVER=http://some_server:port/some_core/dataimport -F command=full-import
-F clean=false

curl $SERVER


when i do a full import (clean=true) i see all of the documents (via the
stats page) show up in the core.

when i do a delta import (clean=false) i see ~900 fewer records in the
import, but i should see much fewer (~84,000) records less, based on the
fact that i am updating the upd_by_ts field to the current timestamp on
84,000 records!

can someone tell me what i am missing?

thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/not-getting-expected-results-when-doing-a-delta-import-via-full-import-tp3983711.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: not getting expected results when doing a delta import via full import

2012-05-14 Thread geeky2
update on this:

i also tried manipulating the timestamps in the dataimport.properties file
to advance the date so that no records could be older than last_index_time

example:

#Mon May 14 12:42:49 CDT 2012
core1-model.last_index_time=2012-05-15 14\:38\:55
last_index_time=2012-05-15 14\:38\:55
~

this leads me to believe that date comparisons are not being done correctly
or have not been configured correctly.

so what does something need to be configured for the date comparison to
work?

example from wiki:

OR last_modified  '${*dataimporter.last_index_time*}'



--
View this message in context: 
http://lucene.472066.n3.nabble.com/not-getting-expected-results-when-doing-a-delta-import-via-full-import-tp3983711p3983715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-05-03 Thread geeky2
thanks for all of the advice / help.

i appreciate it ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3959088.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr snapshots - old school and replication - new school ?

2012-05-03 Thread geeky2
hello all,

enviornment: centOS and solr 3.5

i want to make sure i understand the difference between  snapshots and solr
replication.

snapshots are old school and have been deprecated with solr replication
new school.

do i have this correct?

btw: i have replication working (now), between my master and two slaves - i
just want to make sure i am not missing a larger picture ;)

i have been reading the Smiley Pugh book (pg 349) as well as material on the
wiki at:

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

http://wiki.apache.org/solr/SolrReplication


thank you,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-snapshots-old-school-and-replication-new-school-tp3959152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-05-01 Thread geeky2
hello shawn,

thanks for the reply.

ok - i did some testing and yes you are correct.  

autocommit is doing the commit work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.

... and yes - this is probably not what we need - as far as a replication
strategy for the slaves.

you said, you don't use autocommit.  if so - then why don't you use / like
autocommit?

since we have not done this here - there is no established reference point,
from an operations perspective.

i am looking to formulate some sort of operation strategy, so ANY ideas or
input is really welcome.



it seems to me that we have to account for two operational strategies - 

the first operational mode is a daily append to the solr core after the
database tables have been updated.  this can probably be done with a simple
delta import.  i would think that autocommit could remain on for the master
and replication could also be left on so the slaves picked up the changes
ASAP.  this seems like the mode that we would / should be in most of the
time.


the second operational mode would be a build from scratch mode, where
changes in the schema necessitated a full re-index of the data.  given that
our site (powered by solr) must be up all of the time, and that our full
index time on the master (for the moment) is hovering somewhere around 16
hours - it makes sense that some sort of parallel path - with a cut-over,
must be used.

in this situation is it possible to have the indexing process going on in
the background - then have one commit at the end - then turn replication on
for the slaves?

are there disadvantages to this approach?

also - i really like your suggestion of a build core and live core.  is
this approach you use?

thank you for all of the great input




then 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread geeky2
Hello all,

is there a notification / trigger / callback mechanism people use that
allows them to know when a dataimport process has finished?

we will be doing daily delta-imports and i need some way for an operations
group to know when the DIH has finished.

thank you,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-handler-DIH-notify-when-it-has-finished-tp3953339.html
Sent from the Solr - User mailing list archive at Nabble.com.


should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread geeky2
hello all,

i am just getting replication going on our master and two (2) slaves.

from time to time, i may need to do a complete re-index and clean on the
master.

should replication on the slave - remain On or Off during a full clean and
re-index on the Master?

thank you,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread geeky2
hello,

thank you for the reply,


Does a clean mean issuing a deletion query (e.g.
deleteid*:*/id/delete) prior to re-indexing all of your content?  I
don't think the slaves will download any changes until you've committed at
some point on the master.  


well, in this case when i say, clean  (on the Master), i mean selecting
the Full Import with Cleaning button from the DataImportHandler
Development Console page in solr.  at the top of the page, i have the check
boxes selected for verbose and clean (*but i don't have the commit checkbox
selected*).

by doing the above process - doesn't this issue a deletion query - then
start the import?

and as a follow-up - when actually is the commit being done?


here is my from my solrconfig.xml file on the master

  updateHandler class=solr.DirectUpdateHandler2
*autoCommit
  maxTime6/maxTime
  maxDocs1000/maxDocs
/autoCommit*
maxPendingDeletes10/maxPendingDeletes
  /updateHandler






--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945954.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-26 Thread geeky2
hello,

sorry - i overlooked this message - thanks for checking back and thanks for
the info.

yes - replication seems to be working now:

tailed from logs just now:

2012-04-26 09:21:33,284 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:21:53,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:22:13,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:22:33,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.



 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3941447.html
Sent from the Solr - User mailing list archive at Nabble.com.


impact of EdgeNGramFilterFactory on indexing process?

2012-04-26 Thread geeky2

Hello all,

i am experimenting with EdgeNGramFilterFactory - on two of the fieldTypes in
my schema.

   filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/

i believe i understand this - but want to verify:

1) will this increase my index time?
2) will increase the number of documents in my index?

thank you

--
View this message in context: 
http://lucene.472066.n3.nabble.com/impact-of-EdgeNGramFilterFactory-on-indexing-process-tp3941743p3941743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: faceted searches - design question - facet field not part of qf search fields

2012-04-25 Thread geeky2
thank you BOTH, Erick and Hos for the insight.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3938080.html
Sent from the Solr - User mailing list archive at Nabble.com.


correct location in chain for EdgeNGramFilterFactory ?

2012-04-24 Thread geeky2
hello all,

i want to experiment with the EdgeNGramFilterFactory at index time.

i believe this needs to go in post tokenization - but i am doing a pattern
replace as well as other things.

should the EdgeNGramFilterFactory go in right after the pattern replace?




fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/


filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory pattern=\.
replacement= replace=all/

*put EdgeNGramFilterFactory here === ?*

filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory pattern=\.
replacement= replace=all/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

thanks for any help,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread geeky2
hello,

thank you for the reply,

yes - master has been indexed.

ok - makes sense - the polling interval needs to change

i did check the solr war file on both boxes (master and slave).  they are
identical.  actually - if they were not indentical - this would point to a
different issue altogether - since our deployment infrastructure - rolls the
war file to the slaves when you do a deployment on the master.

this has me stumped - not sure what to check next.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread geeky2
that was it!

thank you.

i did notice something else in the logs now ...

what is the meaning or implication of the message, Connection reset.?



2012-04-24 12:59:19,996 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 12:59:39,998 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
*2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Master at:
http://bogus:bogusport/somepath/somecore/replication/ is not available.
Index fetch failed. Exception: Connection reset*
2012-04-24 13:00:19,998 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:00:40,004 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:00:59,992 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:19,993 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:39,992 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:59,989 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:19,990 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:39,989 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:59,991 INFO  [org.a

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html
Sent from the Solr - User mailing list archive at Nabble.com.


faceted searches - design question - facet field not part of qf search fields

2012-04-24 Thread geeky2


hello all,

this is more of a design / newbie question on how others combine faceted
search fields in to their requestHandlers.

say you have a request handler set up like below.

does it make sense (from a design perspective) to add a faceted search field
that is NOT part of the main search fields (itemNo, productType, brand) in
the qf param?

for example, augment the requestHandler below to include a faceted search on
itemDesc?

would this be confusing ? - to be searching across three fields - but
offering faceted suggestions on itemDesc?

just trying to understand how others approach this

thanks

  requestHandler name=generalSearch class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemNo^1.0 productType^.8 brand^.5/str
  str name=q.alt*:*/str
/lst
lst name=appends
 /lst
lst name=invariants
  str name=facetfalse/str
/lst
  /requestHandler



  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-23 Thread geeky2
hello all,

enviornment: centOS and solr 3.5

i am attempting to set up replication betweeen two solr boxes (master and
slave).

i am getting the following in the logs on the slave box.

2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Master at:
http://someip:someport/somepath/somecore/admin/replication/ is not
available. Index fetch failed. Exception: Invalid version (expected 2, but
10) or the data in not in 'javabin' format

master jvm (jboss host) is being started like this:

-Denable.master=true

slave jvm (jboss host) is being started like this:

-Denable.slave=true

does anyone have any ideas?

i have done the following:

used curl http://someip:someport/somepath/somecore/admin/replication/ from
slave to successfully see master

used ping from slave to master

switched out the dns name for master to hard coded ip address

made sure i can see
http://someip:someport/somepath/somecore/admin/replication/ in a browser


this is my request handler - i am using the same config file on both the
master and slave - but sending in the appropriate switch on start up (per
the solr wiki page on replication)

lst name=master

  str name=enable${enable.master:false}/str
  str name=replicateAfterstartup/str
  str name=replicateAftercommit/str



  str name=confFilesschema.xml,stopwords.txt,elevate.xml/str

  str name=commitReserveDuration00:00:10/str
/lst

str name=maxNumberOfBackups1/str
lst name=slave

  str name=enable${enable.slave:false}/str
  str
name=masterUrlhttp://someip:someport/somecore/admin/replication//str

  str name=pollInterval00:00:20/str


  str name=compressioninternal/str

  str name=httpConnTimeout5000/str
  str name=httpReadTimeout1/str

/lst
  /requestHandler


any suggestions would be great

thank you,
mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread geeky2
thank you for the response.

it seems to be working well ;)

1) i tried your suggestion about removing the qt parameter - 

*somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10

but this results in a 404 error message - is there some configuration i am
missing to support this short-hand syntax for specifying the requestHandler
in the url ?



2) ok - good suggestion.



3) yes it looks like it IS searching across all three (3) fields.

i noticed that for the itemNo field, it reduced the search string from
dishwasher to dishwash - it this because of stemming on the field type, used
for the itemNo field?

lst name=debugstr name=rawquerystringdishwasher/strstr
name=querystringdishwasher/strstr
name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 |
*itemNo:dishwash* | productType:dishwasher^0.8))/strstr
name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash |
productType:dishwasher^0.8)/str





--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html
Sent from the Solr - User mailing list archive at Nabble.com.


is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
hello everyone,

can people give me their thoughts on this.

currently, my schema has individual fields to search on.

are there advantages or disadvantages to taking several of the individual
search fields and combining them in to a single search field?

would this affect search times, term tokenization or possibly other things.

example of individual fields

brand
category
partno

example of a single combined search field

part_info (would combine brand, category and partno)

thank you for any feedback
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3905349.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2


You end up with one multivalued field, which means that you can only
have one analyzer chain.


actually two of the three fields being considered for combination in to a
single field ARE multivalued fields.

would this be an issue?


  With separate fields, each field can be
analyzed differently.  Also, if you are indexing and/or storing the
individual fields, you may have data duplication in your index, making
it larger and increasing your disk/RAM requirements.


this makes sense



  That field will
have a higher termcount than the individual fields, which means that
searches against it will naturally be just a little bit slower.


ok


  Your
application will not have to do as much work to construct a query, though.


actually this is the primary reason this came up.  


If you are already planning to use dismax/edismax, then you don't need
the overhead of a copyField.  You can simply provide access to (e)dismax
search with the qf (and possibly pf) parameters predefined, or your
application can provide these parameters.

http://wiki.apache.org/solr/ExtendedDisMax


can you elaborate on this and how EDisMax would preclude the need for
copyfield?

i am using extended dismax now in my response handlers.

here is an example of one of my requestHandlers

  requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows5/int
  str name=qfitemNo^1.0/str
  str name=q.alt*:*/str
/lst
lst name=appends
  str name=fqitemType:1/str
  str name=sortrankNo asc, score desc/str
/lst
lst name=invariants
  str name=facetfalse/str
/lst
  /requestHandler






Thanks,
Shawn 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3906265.html
Sent from the Solr - User mailing list archive at Nabble.com.


searching across multiple fields using edismax - am i setting this up right?

2012-04-12 Thread geeky2
hello all,

i just want to check to make sure i have this right.

i was reading on this page: http://wiki.apache.org/solr/ExtendedDisMax,
thanks to shawn for educating me.

*i want the user to be able to fire a requestHandler but search across
multiple fields (itemNo, productType and brand) WITHOUT them having to
specify in the query url what fields they want / need to search on*

this is what i have in my request handler


  requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows5/int
  *str name=qfitemNo^1.0 productType^.8 brand^.5/str*
  str name=q.alt*:*/str
/lst
lst name=appends
  str name=sortrankNo asc, score desc/str
/lst
lst name=invariants
  str name=facetfalse/str
/lst
  /requestHandler

this would be an example of a single term search going against all three of
the fields

http://bogus:bogus/somecore/select?qt=partItemNoSearchq=*dishwasher*debugQuery=onrows=100

this would be an example of a multiple term search across all three of the
fields

http://bogus:bogus/somecore/select?qt=partItemNoSearchq=*dishwasher
123-xyz*debugQuery=onrows=100


do i understand this correctly?

thank you,
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3906334.html
Sent from the Solr - User mailing list archive at Nabble.com.


why does building war from source produce a different size file?

2012-03-29 Thread geeky2

hello all,

i have been pulling down the 3.5 solr war file from the mirror site.

the size of this file is:

6403279 Nov 22 14:54 apache-solr-3.5.0.war

when i build the war file from source - i get a different sized file:

 ./dist/apache-solr-3.5-SNAPSHOT.war

6404098 Mar 29 11:41 ./dist/apache-solr-3.5-SNAPSHOT.war

am i building from the wrong source?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-building-war-from-source-produce-a-different-size-file-tp3868307p3868307.html
Sent from the Solr - User mailing list archive at Nabble.com.


authentication for solr admin page?

2012-03-28 Thread geeky2
hello,

environment:

running solr 3.5 under jboss 5.1

i have been searching the user list along with the locations below - to find
out how you require a user to authenticate in to the solr /admin page.  i
thought this would be a common issue - but maybe not ;)

any help would be apprecaited

thank you,
mark



http://drupal.org/node/658466

http://wiki.apache.org/solr/SolrSecurity#Write_Your_Own_RequestHandler_or_SearchComponent





--
View this message in context: 
http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865665.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-28 Thread geeky2
thank you, James.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3865670.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: authentication for solr admin page?

2012-03-28 Thread geeky2
update -

ok - i was reading about replication here:

http://wiki.apache.org/solr/SolrReplication

and noticed comments in the solrconfig.xml file related to HTTP Basic
Authentication and the usage of the following tags:

str name=httpBasicAuthUserusername/str
str name=httpBasicAuthPasswordpassword/str

*Can i place these tags in the request handler to achieve an authentication
scheme for the /admin page?*

// snipped from the solrconfig.xml file

  requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers/

thanks for any help
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865747.html
Sent from the Solr - User mailing list archive at Nabble.com.


preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml

lst name=spellchecker
  str name=namedefault/str
  str name=fielditemDescSpell/str
  str name=buildOnOptimizetrue/str
  str name=spellcheckIndexDirspellchecker_mark/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
thank you very much for the info ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml



field name=itemDescSpell type=textSpell/


  fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType


--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


spellcheck file format - multiple words on a line?

2012-03-23 Thread geeky2
hello all,

for business reasons, we are sourcing the spellcheck file from another
business group.  

the file we receive looks like the example data below

can solr support this type of format - or do i need to process this file in
to a format that has a single word on a single line?

thanks for any help
mark



// snipped from spellcheck file sourced from business group

14-INCH CHAIN
14-INCH RIGHT TINE
1/4 open end ignition wrench
150 DEGREES CELSIUS
15 foot I wire
15 INCH
15 WATT
16 HORSEPOWER ENGINE
16 HORSEPOWER GASOLINE ENGINE
16-INCH BAR
16-INCH CHAIN
16l Cross
16p SIXTEEN PIECE FLAT FLEXIBLE CABLE


--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-file-format-multiple-words-on-a-line-tp3853096p3853096.html
Sent from the Solr - User mailing list archive at Nabble.com.


suggestions on automated testing for solr output

2012-03-16 Thread geeky2
hello all,

i know this is never a fun topic for people, but our SDLC mandates that we
have unit test cases that attempt to validate the output from specific solr
queries.

i have some ideas on how to do this, but would really appreciate feedback
from anyone that has done this or is doing it now.

the ideal situation (for this environment) would be something script based
and automated.

thanks for any input,
mark


--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggestions-on-automated-testing-for-solr-output-tp3833049p3833049.html
Sent from the Solr - User mailing list archive at Nabble.com.


does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-09 Thread geeky2
hello all,

does solr have a mechanism that could intercept a request (before it is
handed off to a request handler).

the intent (from the business) is to send in a generic request - then
pre-parse the url and send it off to a specific request handler.

thank you,
mark 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html
Sent from the Solr - User mailing list archive at Nabble.com.


need input - lessons learned or best practices for data imports

2012-03-05 Thread geeky2
hello all,

we are approaching the time when we will move our first solr core in to a
more production like environment.  as a precursor to this, i am attempting
to write some documents on impact assessment and batch load / data import
strategies.

does anyone have processes or lessons learned - that they can share?

maybe a good place to start - but not limited to - would be how do people
monitor data imports (we are using a very simple DIH hooked to an informix
schema) and send out appropriate notifications?

thank you for any help or suggestions,
mark


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-tp3801327p3801327.html
Sent from the Solr - User mailing list archive at Nabble.com.


does the location of a match (within a field) affect the score?

2012-03-02 Thread geeky2
hello all,

example:

i have a field named itemNo

the user does a search, itemNo:665

there are three document in the core, that look like this

doc1 - itemNo = 1237899*665*

doc2 - itemNo = *665*1237899

doc3 - itemNo = 123*665*7899



does the location or placement of the search string (beginning, middle, end)
affect the scoring of the document?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-the-location-of-a-match-within-a-field-affect-the-score-tp3793634p3793634.html
Sent from the Solr - User mailing list archive at Nabble.com.


need to support bi-directional synonyms

2012-02-22 Thread geeky2
hello all,

i need to support the following:

if the user enters sprayer in the desc field - then they get results for
BOTH sprayer and washer.

and in the other direction

if the user enters washer in the desc field - then they get results for
BOTH washer and sprayer. 

would i set up my synonym file like this?

assuming expand = true..

sprayer = washer
washer = sprayer

thank you,
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html
Sent from the Solr - User mailing list archive at Nabble.com.


proper syntax for using sort query parameter in responseHandler

2012-02-17 Thread geeky2
what is the proper syntax for including sort directive in my responseHandler?

i tried this but got an error:


  requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemNo^1.0/str
  str name=q.alt*:*/str
 * str name=sortrankNo desc/str*
/lst
lst name=appends
  str name=fqitemType:1/str
/lst
lst name=invariants
  str name=facetfalse/str
/lst
  /requestHandler


thank you
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/proper-syntax-for-using-sort-query-parameter-in-responseHandler-tp3755077p3755077.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread geeky2
hello 

thank you for the suggestion - however this did not work.

i went in to solrconfig and change the count to 20 - then restarted the
server and then did a reimport.



is it possible that i am not firing the request handler that i think i am
firing ?


  requestHandler name=/search
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults

str name=spellcheck.dictionarydefault/str

str name=spellcheck.onlyMorePopularfalse/str

str name=spellcheck.extendedResultstrue/str

str name=spellcheck.count20/str
  str name=echoParamsexplicit/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler


query sent to server:

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemDescSpell%3Agusket%0D%0Aversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.build=true

results:

responselst name=responseHeaderint name=status0/intint
name=QTime0/intlst name=paramsstr name=spellchecktrue/strstr
name=indenton/strstr name=start0/strstr
name=qitemDescSpell:gusket
/strstr name=spellcheck.buildtrue/strstr name=rows10/strstr
name=version2.2/str/lst/lstresult name=response numFound=0
start=0//response

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741521.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread geeky2
thank you sooo much - that was it.

also - thank you for the tip on which field to hit, eg itemDesc in stead of
itemDescSpell.

thank you,
mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-10 Thread geeky2
hello,


Or does your field in schema.xml have anything like
autoGeneratePhraseQueries=true in it?


there is no reference to this in our production schema.

this is extremely confusing.

i am not completely clear on the issue?

reviewing our previous messages - it looks like the data is being tokenized
correctly according to the analysis page and output from Luke.

it also looks like the definition of the field and field type is correct in
the schema.xml

it also looks like there is no errant data (quotes) being introduced in to
the query string submitted to solr:

example:

*http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=onversion=2.2q=itemNo%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=*

*so - does the real issue reside in HOW the query is being contructed /
parsed ???

and if so - what drives this query to become a MultiPhraseQuery with
embedded quotes 
*

lst name=debugstr name=rawquerystringitemNo:BP21UAA
/strstr name=querystringitemNo:BP21UAA
/strstr name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa
bp21uaa))/strstr name=parsedquery_toStringitemNo:bp 21 (uaa
bp21uaa)/str

please note - i also mocked up a simple test on my personal linux box - just
using the solr 3.5 distro (we are using 3.3.0 on our production box under
centOS)

i was able to get a simple test to work and yes - my query does look
different

output from my simple mock up on my personal box:

*http://localhost:8983/solr/select?indent=onversion=2.2q=manu%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=*

lst name=debugstr name=rawquerystringmanu:BP21UAA/strstr
name=querystringmanu:BP21UAA/strstr name=parsedquerymanu:bp manu:21
manu:uaa manu:bp21uaa/strstr name=parsedquery_toStringmanu:bp manu:21
manu:uaa manu:bp21uaa/strlst name=explain

schema.xml

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100analyzer type=indextokenizer
class=solr.WhitespaceTokenizerFactory/filter
class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt
enablePositionIncrements=true/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=1 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter
class=solr.PorterStemFilterFactory//analyzeranalyzer
type=querytokenizer class=solr.WhitespaceTokenizerFactory/filter
class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true
expand=true/filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_en.txt enablePositionIncrements=true/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=1 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter
class=solr.PorterStemFilterFactory//analyzer/fieldType

field name=manu type=text_en_splitting indexed=true stored=true
omitNorms=true/

any suggestions would be greatly appreciated.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-09 Thread geeky2


OK, first question is why are you searching on two different values?
Is that intentional? 


yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.  

example: 

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?




If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right? 


yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.


But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as MultiPhraseQuer?


sorry - i did not know i was doing this or how it happened - it was not
intentional and i did not notice this until your posting.  i am not sure of
the implications related to this or what it means to have something as a
MultiPhraseQuery.


Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?


i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread.  i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

it seems that this is the crux of the issue - but i am not sure how to
determine what is manifesting the quotes?  as previously stated - the quotes
are not being entered via the url - they are pasted (in this message thread)
exactly as i pulled them from the browser.

thank you,
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2
hello,

thank you for the reply.

yes - i did re-index after the changes to the schema.

also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.

here is what i did:

in the Field value (Index) box - i placed this: BP2.1UAA

in the Field value (Query) box - i placed this: BP21UAA

then after hitting the Analyze button - i see the following:

Under Index Analyzer for: 

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3   4
term text   BP  2   1   UAA
21  BP21UAA

Under Query Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3
term text   BP  21  UAA
BP21UAA

the above information leads me to believe that i should have BP21UAA as an
indexed term generated from the BP2.1UAA value coming from the database.

also - the query analysis lead me to believe that i should find a document
when i search on BP21UAA in the itemNo field

do i have this correct

am i missing something here?

i am still unable to get a hit when i search on BP21UAA in the itemNo field.

thank you,
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2
hello,

thanks for sticking with me on this ...very frustrating 

ok - i did perform the query with the debug parms using two scenarios:

1) a successful search (where i insert the period / dot) in to the itemNo
field and the search returns a document.

itemNo:BP2.1UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAAversion=2.2start=0rows=10indent=ondebugQuery=on

results from debug

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=indenton/str
str name=rows10/str

str name=version2.2/str
str name=debugQueryon/str
str name=start0/str
str name=qitemNo:BP2.1UAA/str
  /lst
/lst
result name=response numFound=1 start=0
  doc

arr name=brandstrPHILIPS/str/arr
str name=groupId0333500/str
str name=id0333500,1549  ,BP2.1UAA   /str
str name=itemDescPLASMA TELEVISION/str
str name=itemNoBP2.1UAA   /str
int name=itemType2/int

arr name=modelstrBP2.1UAA   /str/arr
arr name=productTypestrPlasma Television^/str/arr
int name=rankNo0/int
str name=supplierId1549  /str
  /doc
/result
lst name=debug
  str name=rawquerystringitemNo:BP2.1UAA/str

  str name=querystringitemNo:BP2.1UAA/str
  str name=parsedqueryMultiPhraseQuery(itemNo:bp 2 (1 21) (uaa
bp21uaa))/str
  str name=parsedquery_toStringitemNo:bp 2 (1 21) (uaa bp21uaa)/str
  lst name=explain
str name=0333500,1549  ,BP2.1UAA   
22.539911 = (MATCH) weight(itemNo:bp 2 (1 21) (uaa bp21uaa) in 134993),
product of:
  0.9994 = queryWeight(itemNo:bp 2 (1 21) (uaa bp21uaa)), product of:
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.02218287 = queryNorm
  22.539913 = (MATCH) fieldWeight(itemNo:bp 2 (1 21) (uaa bp21uaa) in
134993), product of:
1.0 = tf(phraseFreq=1.0)
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.5 = fieldNorm(field=itemNo, doc=134993)
/str
  /lst

  str name=QParserLuceneQParser/str
  lst name=timing
double name=time1.0/double
lst name=prepare
  double name=time0.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double

  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent

double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double

  /lst
/lst
lst name=process
  double name=time1.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent

double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double

  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
  /lst
/lst

  /lst
/lst
/response







2) a NON-successful search (where i do NOT insert a period / dot) in to the
itemNo field and the search does NOT return a document

 itemNo:BP21UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAAversion=2.2start=0rows=10indent=ondebugQuery=on

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=indenton/str
str name=rows10/str

str name=version2.2/str
str name=debugQueryon/str
str name=start0/str
str name=qitemNo:BP21UAA/str
  /lst
/lst
result name=response numFound=0 start=0/
lst name=debug

  str name=rawquerystringitemNo:BP21UAA/str
  str name=querystringitemNo:BP21UAA/str
  str name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa
bp21uaa))/str
  str name=parsedquery_toStringitemNo:bp 21 (uaa bp21uaa)/str
  lst name=explain/
  str name=QParserLuceneQParser/str

  lst name=timing
double name=time1.0/double
lst name=prepare
  double name=time1.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
  /lst

  lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
  /lst
  lst 

  1   2   >