from:"eShard"

Re: recip function error

2014-10-24 Thread eShard

Thank you very much for your replies.
I discovered there was a typo in the function I was given.
One of the parenthesis was in the wrong spot
It should be this:
boost=recip(ms(NOW/HOUR,general_modifydate),3.16e-11,0.08,0.05)

And now it works with edismax! Strange...

Thanks again,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165713.html
Sent from the Solr - User mailing list archive at Nabble.com.

recip function error

2014-10-23 Thread eShard

Good evening,
I'm using solr 4.0 Final.
I tried using this function
boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))
but it fails with this error:
org.apache.lucene.queryparser.classic.ParseException: Expected ')' at
position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))'

I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 
Rebuilt and redeployed AND I get the exact same error.
I only copied over the new jars and war file. Non of the other libraries
seemed to have changed.
the patch is in solr core so I figured I was safe.

Does anyone know how to fix this?

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: recip function error

2014-10-23 Thread eShard

Thanks we're planning on going to 4.10.1 in a few months.
I discovered that recip only works with dismax; I use edismax by default.
does anyone know why I can't use recip with edismax??

I hope this is fixed in 4.10.1...


Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Why does the q parameter change?

2014-09-25 Thread eShard

Good afternoon all,
I just implemented a phrase search and the parsed query gets changed from
rapid prototyping to rapid prototype. 
I used the solr analyzer and prototyping was unchanged so I think I ruled
out a tokenizer.
So can anyone tell me what's going on?
Here's the query:
q=rapid prototypingdefType=edismaxqf=textpf2=text^40ps=0

here's the debugger:
as you can see; prototyping gets changed to just prototype. What's causing
this and how do I turn it off?
Thanks,

lst name=debug
lst name=queryBoosting
str name=qrapid prototyping/str
null name=match//lst
str name=rawquerystringrapid prototyping/strstr
name=querystringrapid prototyping/str
str name=parsedquery(+((DisjunctionMaxQuery((text:rapid))
DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:rapid
prototype^40.0)))/no_coord/str
str name=parsedquery_toString+(((text:rapid) (text:prototype))~2)
(text:rapid prototype^40.0)/str
str name=QParserExtendedDismaxQParser/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why does the q parameter change?

2014-09-25 Thread eShard

Ok, I think I'm on to something.
I omitted this parameter which means it is set to false by default on my
text field.
I need to set it to true and see what happens...
autoGeneratePhraseQueries=true
If I'm reading the wiki right, this parameter if true will preserve phrase
queries...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why does the q parameter change?

2014-09-25 Thread eShard

No, apparently it's the KStemFilter.
should I turn this off at query time?
I'll put this in another question...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Best practice for KStemFilter query or index or both?

2014-09-25 Thread eShard

Good afternoon,
Here's my configuration for a text field.
I have the same configuration for index and query time.
Is this valid? 
What's the best practice for these query or index or both?
for synonyms; I've read conflicting reports on when to use it but I'm
currently changing it over to at indexing time only.

Thanks,

fieldType name=text_general class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
  analyzer type=select
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
/fieldType





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html
Sent from the Solr - User mailing list archive at Nabble.com.

I need a replacement for the QueryElevation Component

2014-07-08 Thread eShard

Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that allows us to tag the doc with
keywords that clearly identify it as a promoted document would be ideal.
I tried using an external file field but that only allows numbers and not
strings (please correct me if I'm wrong)
EFF would be ideal if there is a way to make it take strings.
I also need an easy way to add these tags to specific docs.
If possible, I would like to avoid creating a separate elevation core but it
may come down to that...

Thank you, 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can the elevation component work with synonyms?

2014-06-06 Thread eShard

Good morning Solr compatriots,
I'm using Solr4.0Final and I have synonyms.txt in my schema (only at query
time) like so:
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
  analyzer type=select
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/
filter class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory /  
  /analyzer
/fieldType

However, when I try to call my /elevate handler; the synonyms are factored
in but none of the results say [elevated]=true
I'm assuming this is because the elevation must be an exact match and the
synonyms are expanding it beyond that so elevation is thwarted.
For example, if I have TV elevated and TV is also in synonyms.txt then the
query gets expanded to text:TV text:television.

Is there any way to get the elevation to work correctly with synonyms?

BTW
(I did find a custom synonym handler that works but this will require
significant changes to the front end and I'm not sure it will break if and
when we finally upgrade solr)
Here's the custom synonym filter (I had to drop the code in and rebuild
solr.war to get it to work):
https://github.com/healthonnet/hon-lucene-synonyms 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-elevation-component-work-with-synonyms-tp4140423.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to build Solr4.0 Final?

2014-05-30 Thread eShard

Good morning,
My company uses Solr4.0Final and I need to add some code to it and
recompile.
However, when I rebuild, all of the jars and the war file say Solr 5.0!
I'm using the old build.xml file from 4.0 so I don't know why it's
automatically upgrading.

How do I force it to build the older version of Solr?

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to build Solr4.0 Final?

2014-05-30 Thread eShard

Ok, I think I figured it out.
Somehow my Solr4.0Final project was accidentally updated to 5.0.
The solr/build.xml was fine.
the build.xml file at the top level was pointed at 5.0-snapshot.

I need to pull down the 4.0 and start from scratch.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-build-Solr4-0-Final-tp4138918p4138922.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to exclude a mimetype in tika?

2014-03-26 Thread eShard

Good afternoon,
I'm using solr 4.0 Final
I need movies hidden in zip files that need to be excluded from the index.
I can't filter movies on the crawler because then I would have to exclude
all zip files.
I was told I can have tika skip the movies.
the details are escaping me at this point.
How do I exclude a file in the tika configuration?
I assume it's something I add in the update/extract handler but I'm not
sure.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can the solr dataimporthandler consume an atom feed?

2014-03-25 Thread eShard

Gora! It works now! 
You are amazing! thank you so much!
I dropped the atom: from the xpath and everything is working.
I did have a typo that might have been causing issues too.
thanks again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126887.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard

The only message I get is:
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Skipped: 0

And there are no errors in the log.

Here's what the ibm atom feed looks like:

?xml version=1.0 encoding=utf-16?
atom:feed xmlns:atom=http://www.w3.org/2005/Atom;
xmlns:wplc=http://www.ibm.com/wplc/atom/1.0;
xmlns:age=http://purl.org/atompub/age/1.0;
xmlns:snx=http://www.ibm.com/xmlns/prod/sn;
xmlns:lconn=http://www.ibm.com/lotus/connections/seedlist/atom/1.0;

  atom:id
 
https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Format=ATOMamp;Locale=en_USamp;Range=2amp;Start=0/atom:id
  atom:link
href=https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Range=2amp;Start=1000amp;Format=ATOMamp;Locale=en_USamp;State=U0VDT05EXzIwMTQtMDMtMTMgMTY6MjM6NTguODRfMjAxMS0wNi0wNiAwODowNDoxNC42MjJfNmQ1YzQ3MWMtYTM3ZS00ZjlmLWE0OGEtZWZjYjMyZjU2NDgzXzEwMDBfZmFsc2U%3D;
  rel=next type=application/atom+xml title=Next page /
  atom:generator xml:lang=en-US version=1.2
  lconn:version=4.0.0.0Seedlist Service Backend
  System/atom:generator
  atom:category term=ContentSourceType/Files
  scheme=com.ibm.wplc.taxonomy://feature_taxonomy
  label=Files /
  atom:title xml:lang=en-USFiles : 1,000 entries of Seedlist
  FILES/atom:title
  wplc:action do=update /
  wplc:fieldInfo id=title name=Title type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=author name=Owner's directory id
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=published name=Created timestamp
  type=date contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=updated
  name=Last modification timestamp (major change only, as indicated in UI)
  type=date contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=false /
  wplc:fieldInfo id=summary name=Description type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=tag name=Tag type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=commentCount name=Number of comments
  type=int contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=downloadCount name=Number of downloads
  type=int contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=recommendCount
  name=Number of recommendations type=int
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=fileUpdated
  name=Binary file last modification timestamp type=date
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=fileSize name=Binary file size type=int
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=fileName name=File name type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=true
  supportsExactMatch=false /
  wplc:fieldInfo id=sharedWithUser
  name=Shared with user's directory id type=string
  contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=sharedWithUserName
  name=Shared with user's name type=string
  contentSearchable=false fieldSearchable=false
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=libraryId
  name=The id of library owning the file type=string
  contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=ORGANISATIONAL_ID
  name=The id of the organization the owning user belongs to
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=communityId
  name=The id of the community associated to the file
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=containerType
  name=The type of the container (library) associated to the file
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=ATOMAPISOURCE name=Atom API link
  type=string

Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard

I confirmed the xpath is correct with a third party XPath visualizer.
/atom:feed/atom:entry parses the xml correctly.

Can anyone confirm or deny that the dataimporthandler can handle an atom
feed?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126672.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard

Ok, I found one typo:
the links need to be this: /atom:feed/atom:entry/atom:link/@href
But the import still doesn't work... :(

I guess I have to convert the feed over to RSS 2.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126691.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can the solr dataimporthandler consume an atom feed?

2014-03-21 Thread eShard

Good afternoon,
I'm using solr 4.0 Final.
I have an IBM atom feed I'm trying to index but it won't work.
There are no errors in the log.
All the other DIH I've created consumed RSS 2.0
Does it NOT work with an atom feed?

here's my configuration:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=URLDataSource /
document
entity name=C3Files_from_Seedlist
pk=id
url=https://[redacted];
processor=XPathEntityProcessor
forEach=/atom:feed/atom:entry
transformer=DateFormatTransformer,TemplateTransformer

field column=id xpath=/atom:feed/atom:entry/atom:link@href 
/
field column=link xpath=/atom:feed/atom:entry/atom:link@href /
field column=c3filetitle 
xpath=/atom:feed/atom:entry/atom:title /

field column=author 
xpath=/atom:feed/atom:entry/atom:author /

field column=authoremail
xpath=/atom:feed/atom:entry/atom:author/atom:email /


field column=published 
xpath=/atom:feed/atom:entry/atom:published
dateTimeFormat=-MM-dd /
field column=updated 
xpath=/atom:feed/atom:entry/atom:updated
dateTimeFormat=-MM-dd /

field column=attr_stream_content_type
xpath=/atom:feed/atom:entry/atom:link@type /
field column=index_category template=ConnectionsFiles /
 
/entity
/document
/dataConfig



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard

Hi Erick,
  Let me make sure I understand you:
I'm NOT running SolrCloud; so I just have to put the default field in ALL of
my solrconfig.xml files and then restart and that should be it?
Thanks for your reply,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard

Ok, I updated all of my solrconfig.xml files and I restarted the tomcat
server
AND the errors are still there on 2 out of 10 cores
Am I not reloading correctly?

Here's my /browse handler:
 requestHandler name=/browse class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   
   str name=wtvelocity/str
   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

   
   str name=defTypeedismax/str
   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str

   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str
   str name=mlt.fltext,features,name,sku,id,manu,cat/str
   int name=mlt.count3/int

   
   str name=faceton/str
   str name=facet.fieldcat/str
   str name=facet.fieldmanu_exact/str
   str name=facet.queryipod/str
   str name=facet.queryGB/str
   str name=facet.mincount1/str
   str name=facet.pivotcat,inStock/str
   str name=facet.range.otherafter/str
   str name=facet.rangeprice/str
   int name=f.price.facet.range.start0/int
   int name=f.price.facet.range.end600/int
   int name=f.price.facet.range.gap50/int
   str name=facet.rangepopularity/str
   int name=f.popularity.facet.range.start0/int
   int name=f.popularity.facet.range.end10/int
   int name=f.popularity.facet.range.gap3/int
   str name=facet.rangemanufacturedate_dt/str
   str
name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str
   str name=f.manufacturedate_dt.facet.range.endNOW/str
   str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str
   str name=f.manufacturedate_dt.facet.range.otherbefore/str
   str name=f.manufacturedate_dt.facet.range.otherafter/str

   
   str name=hlon/str
   str name=hl.fltext features name/str
   str name=f.name.hl.fragsize0/str
   str name=f.name.hl.alternateFieldname/str

   
   str name=spellcheckon/str
   str name=spellcheck.extendedResultsfalse/str   
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str   
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str  
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str   
 /lst

 
 arr name=last-components
   strspellcheck/str
   strmanifoldCFSecurity/str
 /arr
  /requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789p4121502.html
Sent from the Solr - User mailing list archive at Nabble.com.

RegexTransformer and xpath in DataImportHandler

2014-03-03 Thread eShard

Good afternoon,
I have this DIH:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=URLDataSource /
document
entity name=blogFeed
pk=id
url=https://redacted/;
processor=XPathEntityProcessor
forEach=/rss/channel/item
   
transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer

field column=id xpath=/rss/channel/item/id /
field column=link xpath=/rss/channel/item/link /
field column=blogtitle xpath=/rss/channel/item/title /
field column=short_blogtitle xpath=/rss/channel/item/title 
/
field column=short_blogtitle regex=^(.{250})([^\.]*\.)(.*)$
replaceWith=$1 sourceColName=blogtitle /
field column=pubdateiso xpath=/rss/channel/item/pubDate
dateTimeFormat=-MM-dd /
field column=category xpath=/rss/channel/item/category /
field column=author xpath=/rss/channel/item/author /
field column=authoremail 
xpath=/rss/channel/item/authoremail /
field column=content xpath=/rss/channel/item/content /
field column=summary xpath=/rss/channel/item/summary /
field column=index_category template=ConnectionsBlogs/

/entity
/document
/dataConfig

I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.

If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.

What am I doing wrong here? Is there a way to populate both? 
And I CANNOT use copyfield here because then the update.chain won't work

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.

SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-02 Thread eShard

Hi,
I'm using Solr 4.0 Final (yes, I know I need to upgrade)

I'm getting this error:
SEVERE: org.apache.solr.common.SolrException: no field name specified in
query and no default specified via 'df' param

And I applied this fix: https://issues.apache.org/jira/browse/SOLR-3646 
And unfortunately, the error persists.
I'm using a multi shard environment and the error is only happening on one
of the shards.
I've already updated about half of the other shards with the missing default
text in /browse but the error persists on that one shard.
Can anyone tell me how to make the error go away?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-org-apache-solr-common-SolrException-no-field-name-specified-in-query-and-no-default-specifiem-tp4120789.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread eShard

Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.  
Over time this folder grows so large that I run out of disk space.  
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too late and
I run out of disk.
I'm still trying to find the right window...
So, (and this is probably a long shot)  I'm wondering if there's anything in
Solr that can delete these docs from /temp after they've been indexed...

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to get phrase recipe working?

2014-01-21 Thread eShard

Good morning,
  In  the Apache Solr 4 cookbook, p 112 there is a recipe for setting up
phrase searches; like so:
fieldType name=text_phrase class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=English/
  /analyzer
/fieldType

I ran a sample query q=text_ph:a-z index and it didn't work very well at
all.
Is there a better way to do phrase searches? 
I need a specific configuration to follow/use.
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-phrase-recipe-working-tp4112484.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ODP: How to get phrase recipe working?

2014-01-21 Thread eShard

Thanks, I'll remove the snowball filter and give it try.
I guess I'm looking for an exact phrase match to start. (Is that the
standard phrase search?)
Is there something better or more versatile?
Btw, great job on the book!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ODP-How-to-get-phrase-recipe-working-tp4112491p4112511.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can I combine standardtokenizer with solr.WordDelimiterFilterFactory?

2013-11-01 Thread eShard

Good morning,
Here's the issue: 
I have and ID that consists of two letters and a number.
The whole user title looks like this: Lastname, Firstname (LA12345).
Now, with my current configuration, I can search for LA12345 and find the
user. 
However, when I type in just the number I get zero results.
If I put a wildcard in (*12345) I find the correct record.  
The problem is I changed that user title to use the
worddelimiterfitlerfactory and it seems to work. 
However, I also copy that field into the text field which just uses the
standardtokenizer and I lose the ability to search for 12345 without a
wildcard.
My question is can (or should) I put the worddelimiterfactory in with the
standardtokenizer in the text field?
Or should I just use one or the other?
Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-combine-standardtokenizer-with-solr-WordDelimiterFilterFactory-tp4098814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-30 Thread eShard

Wow again! 
Thank you all very much for your insights.  
We will certainly take all of this under consideration.

Erik: I want to upgrade but unfortunately, it's not up to me. You're right,
we definitely need to do it.  
And SolrJ sounds interesting, thanks for the suggestions.

By the way, is there a Solr upgrade guide out there anywhere?


Thanks again!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098431.html
Sent from the Solr - User mailing list archive at Nabble.com.

Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard

Good morning,
I have a 1 TB repository with approximately 500,000 documents (that will
probably grow from there) that needs to be indexed.  
I'm limited to Solr 4.0 final (we're close to beta release, so I can't
upgrade right now) and I can't use SolrCloud because work currently won't
allow it for some reason.

I found this configuration from this link:
http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-td3656484.html#a3657056
 
He said he was able to index 1 TB on a single server with 40 cores and 128
GB of RAM with 10 shards.

Is this my only option? Or is there a better configuration?
Is there some formula for calculating server specifications (this much data
and documents equals this many cores, RAM, hard disk space etc)?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard

Wow, thanks for your response.
You raise a lot of great questions; I wish I had the answers!
We're still trying to get enough resources to finish crawling the
repository, so I don't even know what the final size of the index will be.
I've thought about excluding the videos and other large files and using a
data import handler to just send the meta data but there are problems no
matter where I turn.  
I'm taking what you said back to the server team for deliberation.
Thanks again for your insights



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098259.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration and specs to index a 1 terabyte (TB) repository

2013-10-29 Thread eShard

P.S. 
Offhand, how do I control how much of the index is held in RAM?
Can you point me in the right direction?
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-and-specs-to-index-a-1-terabyte-TB-repository-tp4098227p4098260.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to manually update a field in the index without re-crawling?

2013-10-01 Thread eShard

Good morning,
I'm currently using Solr 4.0 FINAL.
I indexed a website and it took over 24 hours to crawl.
I just realized I need to rename one of the fields (or add a new one). 
so I added the new field to the schema,
But how do I copy the data over from the old field to the new field without
recrawling everything?

Is this possible?

I was thinking about maybe putting an update chain processor in the /update
handler but I'm not sure that will work.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-manually-update-a-field-in-the-index-without-re-crawling-tp4092955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 is stripping XML format from RSS content field

2013-10-01 Thread eShard

If anyone is interested, I managed to resolve this a long time ago.
I used a Data Import Handler instead and it worked beautifully.
DIH are very forgiving and it takes what ever XML data is there and injects
it into the Solr Index.
It's a lot faster than crawling too.
You use XPATH to map the fields to your schema.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html
Sent from the Solr - User mailing list archive at Nabble.com.

QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard

Hi,
I'm using solr 4.0 final built around Dec 2012.
I was initially told that the QEC didn't work for distributed search but
apparently it was fixed.
Anyway, I use the /elevate handler with [elevated] in the field list and I
don't get any elevated results.
elevated=false in the result block.
however, if I turn on debugQuery; the elevated result appears in the debug
section under queryBoost.
Is this the only way you can get elevated results?
Because before (and I can't remember if this was before or after I went to
4.0 Final) I would get the elevated results mixed in with the regular
results in the result block.
elevated=true was the only way to tell them apart.
I also tried forceElevation, enableElevation, exclusive but there is still
no elevated results in the result block.
What am I doing wrong?
query:
http://localhost:8080/solr/Profiles/elevate?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=100enableElevation=trueforceElevation=truedf=textqt=edismaxdebugQuery=true
Here's my config:
  searchComponent name=elevator class=solr.QueryElevationComponent 

str name=queryFieldTypetext_general/str
str name=config-fileelevate.xml/str
  /searchComponent

  
  requestHandler name=/elevate class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=echoParamsexplicit/str
 str name=dftext/str 
/lst
arr name=last-components
  strelevator/str
/arr
  /requestHandler
elevate.xml
elevate
 query text=gangnam style
  doc
id=https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3;
/  
 /query
 
/elevate




--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard

Sure,
Here are the results with the debugQuery=true; with debugging off, there are
no results.
The elevated result appears in the queryBoost section but not in the result
section:
?xml version=1.0 encoding=utf-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
  str name=enableElevationtrue/str
  str name=wtxml/str
  str name=rows100/str
  str name=fl*,[elevated]/str
  str name=dftext/str
  str name=debugQuerytrue/str
  str name=start0/str
  str name=qgangnam/str
  str name=forceElevationtrue/str
  str name=qtedismax/str
/lst
  /lst
  result name=response numFound=0 start=0/result
  lst name=debug
lst name=queryBoosting
  str name=qgangnam/str
  arr name=match
str
   
https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3/str
  /arr
/lst
str name=rawquerystringgangnam/str
str name=querystringgangnam/str
str name=parsedquery(text:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0))/no_coord/str
str name=parsedquery_toStringtext:gangnam
   
((id:https://server/profiles/html/profileView.do?key=ec6de388-fb7a-4397-9ed5-51760e4333d3)^0.0)/str
lst name=explain /
str name=QParserLuceneQParser/str
lst name=timing
  double name=time0.0/double
  lst name=prepare
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent

  double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.QueryElevationComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent

  double name=time0.0/double
/lst
  /lst
  lst name=process
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent

  double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.QueryElevationComponent

  double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent

  double name=time0.0/double
/lst
  /lst
/lst
  /lst
/response




--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QueryElevationComponent results only show up with debug = true

2013-08-30 Thread eShard

I can guarantee you that the ID is unique and it exists in that index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-results-only-show-up-with-debug-true-tp4087531p4087565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can a data import handler grab all pages of an RSS feed?

2013-08-26 Thread eShard

Good morning,
I have an IBM Portal atom feed that spans multiple pages.
Is there a way to instruct the DIH to grab all available pages?
I can put a huge range in but that can be extremely slow with large amounts
of XML data.
I'm currently using Solr 4.0 final.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2013-08-25 Thread eShard

I just resolved this same error.
The problem was that I had a lot of ampersands () that were un-escaped in
my XML doc
There was nothing wrong with my DIH; it was the xml doc it was trying to
consume.
I just used StringEscapeUtils.escapeXml from apache to resolve...
Another big help was the Eclipse XML validation engine. 
Just add your doc to an existing project and right click anywhere on the doc
and select validate from the menu.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Unexpected-character-code-61-expected-a-semi-colon-after-the-reference-for-entity-st-tp2816210p4086531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard

Good morning,
I'm using solr 4.0 final on tomcat 7.0.34 on linux
I created 3 new data import handlers to consume 3 RSS feeds.
They seemed to work perfectly.
However, today, I'm getting these errors:
10:42:17SEVERE  SolrCorejava.lang.IndexOutOfBoundsException: 
Index: 9,
Size: 8
10:42:17SEVERE  SolrDispatchFilter 
null:java.lang.IndexOutOfBoundsException: Index: 9, Size: 8
10:42:17SEVERE  SolrCoreorg.apache.solr.common.SolrException: 
Server at
https://search:7443/solr/Communities returned non ok status:500,
message:Internal Server Error
10:42:17SEVERE  SolrDispatchFilter 
null:org.apache.solr.common.SolrException: Server at
https://search/solr/Communities returned non ok status:500,
message:Internal Server Error

I read that the index is corrupt so I deleted it and restarted and then the
same errors jumped to the next core with the DIH for the RSS feed.

How do I fix this?

Here's my dih in solrconfig.xml
  requestHandler name=/DIHCommunityFeed
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdih-comm-feed.xml/str
 str name=update.chainSemaAC/str
/lst
  /requestHandler

Here's the dih config
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=URLDataSource /
document
entity name=communitiesFeed
pk=id
url=https://search/C3CommunityFeedDEV/;
processor=XPathEntityProcessor
forEach=/rss/channel/item
transformer=DateFormatTransformer

field column=id xpath=/rss/channel/item/id /
field column=link xpath=/rss/channel/item/link /
field column=communitytitle xpath=/rss/channel/item/title 
/
field column=pubdateiso xpath=/rss/channel/item/pubDate
dateTimeFormat=-MM-dd /
field column=category xpath=/rss/channel/item/category /
field column=author xpath=/rss/channel/item/author /
field column=authoremail 
xpath=/rss/channel/item/authoremail /
field column=content xpath=/rss/channel/item/content /
field column=summary xpath=/rss/channel/item/summary /


/entity
/document
/dataConfig

Here a partial of my schema
   field name=title type=text_general indexed=true stored=true
multiValued=true/
   field name=subject type=text_general indexed=true stored=true/
   field name=description type=text_general indexed=true
stored=true/
   field name=comments type=text_general indexed=true stored=true/
   field name=author type=text_general indexed=true stored=true/
   field name=authoremail type=text_general indexed=true
stored=true/
   field name=keywords type=text_general indexed=true stored=true/
   field name=category type=text_general indexed=true stored=true/
   field name=content_type type=string indexed=true stored=true
multiValued=true/
   field name=last_modified type=date indexed=true stored=true/
   field name=links type=string indexed=true stored=true
multiValued=true/
   
   field name=solr.title type=string indexed=true stored=true
multiValued=false /
   field name=communitytitle type=string indexed=true stored=true
multiValued=false /
   field name=content type=string indexed=true stored=true
multiValued=true/

   
   field name=pubdateiso type=date dateTimeFormat=-MM-dd
indexed=true stored=true multiValued=true/
   field name=link type=string indexed=true stored=true
multiValued=true/
   field name=summary type=text_general indexed=true stored=true/   
   field name=published type=date indexed=true stored=true
multivalued=true /
   field name=updated type=date indexed=true stored=true
multivalued=true /

   copyField source=link dest=text/
   copyField source=description dest=text/
   copyField source=communitytitle dest=text/
   copyField source=communitytitle dest=solr.title/
   copyField source=content dest=text/
   copyField source=author dest=text/
   copyField source=authoremail dest=text/
   copyField source=summary dest=text/   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard

Ok, these errors seem to be caused by passing incorrect parameters in a
search query.
Such as: spellcheck=extendedResults=true 
instead of 
spellcheck.extendedResults=true

Thankfully, it seems to have nothing to do with the DIH at all.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812p4084874.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to parse multivalued data into single valued fields?

2013-08-08 Thread eShard

Ok, I have one index called Communities from an RSS feed.
each item in the feed has multiple titles (which are all the same for this
feed) 
So, the title needs to be cleaned up before it is put into the community
index
let's call the field community_title;
And then an UpdateProcessorChain needs to fire and it takes community_title
and puts it into another index for auto completion suggestions called
SolrAC.

Does that make sense?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108p4083302.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to parse multivalued data into single valued fields?

2013-08-07 Thread eShard

Hi,
I'm currently using solr 4.0 final with Manifoldcf v1.3 dev.
I have multivalued titles (the names are all the same so far) that must go
into a single valued field.
Can a transformer do this?
Can anyone show me how to do it?

And this has to fire off before an update chain takes place.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to improve (keyword) relevance?

2013-07-22 Thread eShard

Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?) 

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:
requestHandler name=/better class=solr.StandardRequestHandler
  1st name=defaults
  str name=indenttrue/str
  str name=q_query_:{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}
  /str
  str name=qfQueryname^10 description/str
  str name=mmQuery1/str
  str name=pfQueryname description/str
  str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}^10/str
  /1st
/requestHandler




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to improve (keyword) relevance?

2013-07-22 Thread eShard

Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdfwt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question. 

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is there a way to capture div tag by id?

2013-06-25 Thread eShard

let's say I have a div with id=myDiv
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.

how do I capture h1 tags?

2013-06-24 Thread eShard

I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7.
I need to capture the h1 tags on each web page as that is the true title
for the lack of a better word.
I can't seem to get it to work at all. 
I read the instructions and used the capture component and then mapped it to
a field named h1 in the schema.
Here's my update/extract handler:

requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=fmap.contenttext/str
  str name=fmap.titlesolr.title/str
  str name=fmap.namesolr.name/str
  str name=captureh1/str
  str name=fmap.h1h1/str
  
  str name=descriptioncomments/str
  
  str name=fmap.Last-Modifiedlast_modified/str
  str name=uprefixattr_/str
  str name=lowernamestrue/str
  
/lst
Can anyone tell me what I doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I capture h1 tags?

2013-06-24 Thread eShard

Ok, I figured it out:
you need to add this too:

str name=captureAttrtrue/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792p4072798.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to store the document folder path in solr?

2013-05-15 Thread eShard

Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...

Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to aggregate data in solr 4.0?

2013-05-15 Thread eShard

Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at Nabble.com.

relevance when merging results

2013-04-26 Thread eShard

Hi,
I'm currently using Solr 4.0 final on tomcat v7.0.3x
I have 2 cores (let's call them A and B) and I need to combine them as one
for the UI. 
However we're having trouble on how to best merge these two result sets.
Currently, I'm using relevancy to do the merge. 
For example,
I search for red in both cores.
Core A has a max score of .919856 with 87 results
Core B has a max score or .6532563 with 30 results

I would like to simply merge numerically but I don't know if that's valid.
If I merge in numerical order then Core B results won't appear until element
25 or later.

I initially thought about just taking the top 5 results from each and layer
one on top of the other.

Is there a best practice out there for merging relevancy?
Please advise...
Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/relevance-when-merging-results-tp4059275.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to configure shards with SSL?

2013-04-10 Thread eShard

Ok, 
We figured it out:
The cert wasn't in the trusted CA keystore. I know we put it in there
earlier; I don't know why it was missing.
But we added it in again and everything works as before.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735p4055064.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to configure shards with SSL?

2013-04-09 Thread eShard

Good morning everyone,
I'm running solr 4.0 Final with ManifoldCF v1.2dev on tomcat 7.0.37 and I
had shards up and running on http but when I migrated to SSL it won't work
anymore.
First I got an IO Exception but then I changed my configuration in
solrconfig.xml to this:
   requestHandler name=/all class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=wtxml/str
   str name=indenttrue/str
   str name=q.alt*:*/str

str name=flid, solr.title, content, category, link, pubdateiso/str

str
name=shardsdev:7443/solr/ProfilesJava/|dev:7443/solr/C3Files/|dev:7443/solr/Blogs/|dev:7443/solr/Communities/|dev:7443/solr/Wikis/|dev:7443/solr/Bedeworks/|dev:7443/solr/Forums/|dev:7443/solr/Web/|dev:7443/solr/Bookmarks//str
 

 /lst
 shardHandlerFactory class=HttpShardHandlerFactory
str name=urlSchemehttps:///str
int name=socketTimeOut1000/int
int name=connTimeOut5000/int
/shardHandlerFactory   

  /requestHandler

And Now I'm getting this error:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:
How do I configure shards with SSL?
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735.html
Sent from the Solr - User mailing list archive at Nabble.com.

detailed Error reporting in Solr

2013-04-04 Thread eShard

Good morning,
I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. 
And I'm battling Tika XML parse errors again. 
Solr reports this error:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error which is too vague.
I had to manually run the link against the tika app and I got a much more
detailed error.
Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105;
The entity nbsp was referenced, but not declared.
so there are old school non break space in the html that tika can't handle.

for example: li Cyber Systems and Technologynbsp;rsaquo;
/mission/CST/CST.html   /li

My question is two fold:
1) how do I get solr to report more detailed errors and
2) how do I get tika to accept (or ignore) nbsp?

thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard

ok, one possible fix is to add the xml equivalent to nbsp with is:
?xml version=1.0?
!DOCTYPE some_name [
lt;!ENTITY nbsp quot;amp;#160;quot;
] 

but how do I add this into the tika configuration?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard

Yes, that's it exactly.
I crawled a link with these (nbsp;rsaquo;) in each list item and solr
couldn't handle it threw the xml parse error and the crawler terminated the
job.

Is this fixable? Or do I have to submit a bug to the tika folks?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: query builder for solr UI?

2013-02-28 Thread eShard

sorry,
The easiest way to describe it is specifically we desire a google-like
experience.
so if the end user types in a phrase or quotes or +, - (for and, not) etc
etc.
the UI will be flexible enough to build the correct solr query syntax.

How will edismax help?

And I tried simplifying queries by using the copyfield command to copy all
of the metadata to the text field.
So now the only field we have to query is the text field but I doubt that is
going to be a panacea.

Does that make sense?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043643.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: query builder for solr UI?

2013-02-28 Thread eShard

Good question,
if the user types in special characters like the dash - 
How will I know to treat it like a dash or the NOT operator? The first one
will need to be URL encoded the second one won't be resulting in very
different queries.

So I apologize for not being more clear, so really what I'm after is making
it easy for the user to communicate what exactly they are looking for and to
URL encode their input correctly. that's what I meant by query building

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481p4043659.html
Sent from the Solr - User mailing list archive at Nabble.com.

query builder for solr UI?

2013-02-27 Thread eShard

Good day,
Currently we are building a front end for solr (in jquery, html, and css)
and I'm struggling with making a query builder that can handle pretty much
whatever the end user types into the search box.
does something like this already exist in javascript/jquery?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-builder-for-solr-UI-tp4043481.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.0 is stripping XML format from RSS content field

2013-02-11 Thread eShard

Hi,
I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler
that Manifold is indeed sending the content field from a RSS feed that
contains xml data
However, when I query the index the content field is there with just the
data; the XML structure is gone.
Does anyone know how to stop Solr from doing this?
I'm using tika but I don't see it in the update/extract handler.
Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can you call the elevation component in another requesthandler?

2013-02-07 Thread eShard

Good day,
I got my elevation component working with the /elevate handler. 
However, I would like to add the elevation component to my main search
handler which is currently /query.
so I can have one handler return everything (elevated items with regular
search results; i.e. one stop shopping, so to speak)
This is what I tried:
  requestHandler name=/query class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=wtxml/str
   str name=indenttrue/str
   str name=dftext/str
 /lst
 arr name=last-components
strelevator/str
strmanifoldCFSecurity/str 
 /arr
  /requestHandler

I also tried it in first components as well.
Is there any way to combine these? Otherwise the UI will have to make
separate ajax calls and we're trying to minimize that.
Thanks,








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can you call the elevation component in another requesthandler?

2013-02-07 Thread eShard

Update:
Ok, If I search for gangnam style in /query handler by itself, elevation
works!
If I search with gangnam style and/or something else the elevation component
doesn't work but the rest of the query does.

here's the examples:
works:
/query?q=gangnam+stylefl=*,[elevated]wt=xmlstart=0rows=50debugQuery=truedismax=true

elevation fails:
/query?q=gangnam+style+OR+title%3A*White*fl=*,[elevated]wt=xmlstart=0rows=50debugQuery=truedismax=true

So I guess I have to do separate queries at this point.
Is there a way to combine these 2 request handlers?

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-you-call-the-elevation-component-in-another-requesthandler-tp4039054p4039076.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multicore search with ManifoldCF security not working

2013-01-28 Thread eShard

Good morning,
I used this post here to join to search 2 different cores and return one
data set.
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
The good news is that it worked!
The bad news is that one of the cores is Opentext and the ManifoldCF
security check isn't firing!
So users could see documents that they aren't supposed to.
The opentext security works if I call the core handler individually. it
fails for the merged result.
I need to find a way to get the AuthenticatedUserName parameter to the
opentext core.
Here's my /query handler for the merged result
  requestHandler name=/query class=solr.SearchHandler
  
  lst name=defaults
str name=q.alt*:*/str

str name=flid, attr_general_name, attr_general_owner,
attr_general_creator, attr_general_modifier, attr_general_description,
attr_general_creationdate, attr_general_modifydate, solr.title, 
content, category, link, pubdateiso
/str
str
name=shardslocalhost:8080/solr/opentext/,localhost:8080/solr/Profiles//str
  /lst 
  arr name=last-components
strmanifoldCFSecurity/str
  /arr
  /requestHandler

As you can see, I tried calling manifoldCFSecurity first and it didn't work. 
I was thinking perhaps I can call the shards directly in the URL and put the
AuthenticatedUserName on the opentext shard but I'm getting pulled in
different directions currently.

Can anyone point me in the right direction?
Thanks,






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore search with ManifoldCF security not working

2013-01-28 Thread eShard

I'm sorry, I don't know what you mean.
I clicked on the hidden email link, filled out the form and when I hit
submit; 
I got this error:
Domain starts with dot
Please fix the error and try again.

Who exactly am I sending this to and how do I get the form to work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-search-with-ManifoldCF-security-not-working-tp4036776p4036829.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to use SolrAjax with multiple cores?

2013-01-28 Thread eShard

Hi,
I need to build a UI that can access multiple cores. And combine them all on
an Everything tab.
The solrajax example only has 1 core.
How do I setup multicore with solrajax? 
Do I setup 1 manager per core? How much of a performance hit will I take
with multiple managers running?
Is there a better way to do this?
Is there a better UI to use?

Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrAjax-with-multiple-cores-tp4036840.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching for field that contains multiple values

2013-01-28 Thread eShard

All I had to do was put a wildcard before and after the search term and it
would succeed. (*Maritime*)
Searching multi value fields wouldn't work any other way. 
Like so:
http://localhost:8080/solr/Blogs/select?q=title%3A*Maritime*wt=xml

but I'll check out those other suggestions...

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-for-field-that-contains-multiple-values-tp4033944p4036854.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: error initializing QueryElevationComponent

2013-01-25 Thread eShard

In case anyone was wondering, the solution is to html encode the URL.
Solr didn't like the 's; just convert them to amp; and it works!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194p4036261.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-23 Thread eShard

Thanks,
That worked.
So the documentation needs to be fixed in a few places (the solr wiki and
the default solrconfig.xml in Solr 4.0 final; I didn't check any other
versions)
I'll either open a new ticket in JIRA to request a fix or reopen the old
one...

Furthermore,
I tried using the ElevatedMarkerFactory and it didn't behave the way I
thought it would.

this http://localhost:8080/solr/Lisa/elevate?q=foo+barwt=xmldefType=dismax
got me all the doc info but no elevated marker

I ran this
http://localhost:8080/solr/Lisa/elevate?q=foo+barfl=[elevated]wt=xmldefType=dismax
and all I got was response = 1 and elevated = true

I had to run this to get all of the above info:
http://localhost:8080/solr/Lisa/elevate?q=foo+barfl=*,[elevated]wt=xmldefType=dismax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-22 Thread eShard

Good morning,
I can't seem to figure out how to load this class
Can someone please point me in the right direction?
Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035330.html
Sent from the Solr - User mailing list archive at Nabble.com.

error initializing QueryElevationComponent

2013-01-21 Thread eShard

Hi,
I'm trying to test out the queryelevationcomponent.
elevate.xml is in the solrconfig.xml and it's in the conf directory.
I left the defaults.
I added this to the elevate.xml
elevate
 query text=foo bar
  doc
id=https://opentextdev/cs/llisapi.dll?func=llobjID=577575objAction=download;
/
 /query
/elevate

id is a string setup as the uniquekey

And I get this error:
16:25:48SEVERE  Config  Exception during parsing file:
elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml;
lineNumber: 28; columnNumber: 77; The reference to entity objID must end
with the ';' delimiter.
16:25:48SEVERE  SolrCorejava.lang.NullPointerException
16:25:48SEVERE  CoreContainer   Unable to create core: Lisa
16:25:48SEVERE  CoreContainer   
null:org.apache.solr.common.SolrException:
Error initializing QueryElevationComponent. 

what am I doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-initializing-QueryElevationComponent-tp4035194.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-21 Thread eShard

Hi,
This is related to my earlier question regarding the elevationcomponent.
I tried turning this on:
 If you are using the QueryElevationComponent, you may wish to mark
documents that get boosted.  The
  EditorialMarkerFactory will do exactly that: 
 -- 
 transformer name=qecBooster
class=org.apache.solr.response.transform.EditorialMarkerFactory /

but it fails to load this class.

I'm using solr 4.0 final.
How do I get this to load?

thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr multicore aborts with socket timeout exceptions

2013-01-17 Thread eShard

I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2
dev running on Jetty.

I have solr multicore set up with 10 cores. (Is this too much?)
I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10
JVMs per connection)
From the look of it; Solr couldn't handle all the data that ManifoldCF was
sending it and the connection would abort socket timeout exceptions.
I tried increasing the maxThreads to 200 on tomcat and it didn't work.
In the ManifoldCF throttling section, I decreased the number of JVMs per
connection from 10 down to 1 and not only did the crawl speed up
significantly, the socket exceptions went away (for the most part)
Here's the ticket for this issue:
https://issues.apache.org/jira/browse/CONNECTORS-608

My question is this: how do I increase the number of connections on the solr
side so I can run multiple ManifoldCF jobs concurrently without aborting or
timeouts?

The ManifoldCF team did mention that there was a committer who had socket
timeout exceptions in a newer version of Solr and he fixed it by increasing
the timeout window. I'm looking for that patch if available.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-multicore-aborts-with-socket-timeout-exceptions-tp4034250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread eShard

I keep seeing these in the tomcat logs:
Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
INFO: [Lisa] webapp=/solr path=/admin/logging
params={since=1358453312320wt=jso
n} status=0 QTime=0

I'm just curious:
What is getting executed here? I'm not running any queries against this core
or using it in any way currently.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Tutorial for Solr query language, dismax and edismax?

2013-01-15 Thread eShard

Does anyone have a great tutorial for learning the solr query language,
dismax and edismax?
I've searched endlessly for one but I haven't been able to locate one that
is comprehensive enough and has a lot of examples (that actually work!).
I also tried to use wildcards, logical operators, and a phrase search and it
either didn't work or behave the way I thought it would.

for example, I tried to search a multivalued field solr.title and a content
field that contains their phone number (and a lot of other data)
so, from the solr admin query page;
in the q field i tried lots of variations of this- solr.title:*Costa,
Julie* AND content:tel=
And I either got 0 results or ALL the results.
solr.title would only work if I put in solr.title:*Costa* but not anything
longer than that. Even though there are plenty of Costa, J's (John, Julie,
Julia, Jerry etc)
I should be able to do a phrase search out of the box, shouldn't I?
I also read on one site that only edismax can use logical operators but I
couldn't get that to work either.
Can anyone point me in the right direction?
I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev

Thank you,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html
Sent from the Solr - User mailing list archive at Nabble.com.

ivy errors trying to build solr from trunk

2013-01-10 Thread eShard

I downloaded the latest from solr.
I applied a patch
cd to solr dir
and I try ant dist
I get these ivy errors
ivy-availability-check:
 [echo] Building analyzers-phonetic...
ivy-fail:
 [echo]  This build requires Ivy and Ivy could not be found in your
ant classpath.
 [echo]  (Due to classpath issues and the recursive nature of the
Lucene/Solr 
 [echo]  build system, a local copy of Ivy can not be used an loaded
dynamically 
 [echo]  by the build.xml)
 [echo]  You can either manually install a copy of Ivy 2.2.0 in your
ant classpath:
 [echo]http://ant.apache.org/manual/install.html#optionalTasks
 [echo]  Or this build file can do it for you by running the Ivy
Bootstrap target:
 [echo]ant ivy-bootstrap 
 [echo]  
 [echo]  Either way you will only have to install Ivy one time.
 [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your
Ant User Library:
 [echo]C:\Users\da24005/.ant/lib
 [echo]  
 [echo]  If you would prefer, you can have it installed into an
alternative 
 [echo]  directory using the
-Divy_install_path=/some/path/you/choose option, 
 [echo]  but you will have to specify this path every time you build
Lucene/Solr 
 [echo]  in the future...
 [echo]ant ivy-bootstrap
-Divy_install_path=/some/path/you/choose
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]  If you have already run ivy-bootstrap, and still get this
message, please 
 [echo]  try using the --noconfig option when running ant, or
editing your global
 [echo]  ant config to allow the user lib to be loaded.  See the
wiki for more details:
 [echo]http://wiki.apache.org/lucene-java/HowToContribute#antivy
 [echo] 

BUILD FAILED

i tried the ivy-bootstrap but I still get the same error.
I have the ivy jar in the ant lib directory.

what am I doing wrong? and it says use --noconfig if ivy-bootstrap didn't
work. Well --noconfig is not a valid ant command. where/how do I use it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ivy errors trying to build solr from trunk

2013-01-10 Thread eShard

Ok, the old problem was that eclipse was using a different version of ant
1.8.3.
I dropped the ivy jar in the build path and now I get these errors:
[ivy:retrieve]  ERRORS
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://repo1.maven.org/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://oss.sonatype.org/content/repositories/releases/commons-codec/commons-codec/1.7/commons-codec-1.7.jar
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.pom
[ivy:retrieve]  Server access Error: Connection timed out: connect
url=http://mirror.netcologne.de/maven2/commons-codec/commons-codec/1.7/commons-codec-1.7.jar

Apparently, I can't get to maven since I'm behind a firewall.
are the solr deps available for manual download somewhere?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ivy-errors-trying-to-build-solr-from-trunk-tp4032300p4032332.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr invalid date string

2013-01-08 Thread eShard

I'm currently running solr 4.0 alpha with manifoldCF v1.1 dev
Manifold is sending solr the datetime as milliseconds expired after
1-1-1970.
I've tried setting several date.formats in the extraction handler but I
always get this error: 
and the manifoldcf crawl aborts.
SolrCoreorg.apache.solr.common.SolrException: Invalid Date
String:'134738361' at
org.apache.solr.schema.DateField.parseMath(DateField.java:174) at
org.apache.solr.schema.TrieField.createField(TrieField.java:540)

here's my extraction handler:
requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=fmap.contenttext/str
  str name=fmap.titlesolr.title/str
  str name=fmap.namesolr.name/str
  str name=linklink/str
  str name=fmap.pubdatepubdate/str
  str name=summarysummary/str
  str name=descriptioncomments/str
  str name=publishedpublished/str
  
  str name=fmap.Last-Modifiedlast_modified/str
  str name=uprefixattr_/str
  str name=lowernamestrue/str
  str name=fmap.divignored_/str
/lst
lst name=date.formats 
  str-MM-dd/str
  str-MM-dd'T'HH:mm:ss.SSS'Z'/str
/lst
 
/lst--
  /requestHandler


here's pubdate in the schema
field name=pubdate type=date indexed=true stored=true
multiValued=true/

the dates are already in UTC time they're just in milliseconds...

What am I doing wrong?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr invalid date string

2013-01-08 Thread eShard

I'll certainly ask manifold if they can send the date in the correct format.
Meanwhile;
How would I create an updater to change the format of a date?
Are there any decent examples out there?

thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661p4031669.html
Sent from the Solr - User mailing list archive at Nabble.com.

is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread eShard

I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
I'm currently running Solr 4.0 alpha on Tomcat 7.
Is there an easy way to surgically replace files and upgrade? 
Or should I completely start over with a fresh install?
Ideally, I'm looking for a set of steps...
Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-an-easy-way-to-upgrade-from-Solr-4-alpha-to-4-0-final-tp4031682.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many Tika errors

2012-12-12 Thread eShard

Ok, I managed to fix the universal charset error is caused by a missing
dependency
just download universalchardet-1.0.3.jar and put it in your extraction lib

the microsoft errors will probably be fixed in a future release of the POI
jars. (v3.9 didn't fix this error)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126p4026347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Too many Tika errors

2012-12-11 Thread eShard

I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example single
core as well with manifoldcf v1.1
I had everything working but then the crawler stops and I have Tika errors
in the solr log
I had tika 1.1 and that produces these errors: 
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@17bc9c03

So, I upgraded to tika 1.2 and again everything seemed to be working (I
indexed 24,000 files) then I recrawled the repository and again it stops;
this time the tika errors are:
null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
org/mozilla/universalchardet/CharsetListener at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)

What's going on here? What version of tika should I use?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html
Sent from the Solr - User mailing list archive at Nabble.com.

How do you get the document name from Open Text?

2012-08-02 Thread eShard

I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. 
I have the cats/atts turned on in Open Text and I can see them all in the
Solr index.
However, the id is just the URL to download the doc from open text and the
document name either from Open Text or the document properties is nowhere to
be found.
I tried using resourceName in the solrconfig.xml as it was described in the
manual but it doesn't work.
I used this:
requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=fmap.contenttext/str
  str name=fmap.Last-Modifiedlast_modified/str
  str name=uprefixattr_/str
  str name=resourceNameFile Name/str
  str name=lowernamestrue/str
/lst


  /requestHandler

but all I get is File Name in resourceName. Should I leave the value blank
or is there some other field I should use?
Please advise



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-you-get-the-document-name-from-Open-Text-tp3998908.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr not getting OpenText document name and metadata

2012-07-27 Thread eShard

Hi,
I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the
output is sent to Solr (4.0 alpha).
All I see in the index is an id = to the opentext download URL and a version
(a big integer value).
What I don't see is the document name from OpenText or any of the Opentext
metadata.
Does anyone know how I can get this data? because I can't even search by
document name or by document extension! 
Only a few of the documents actually have a title in the solr index. but the
Opentext name of the document is nowhere to be found.
if I know some text within the document I can search for that.
I'm using the default schema with tika as the extraction handler
I'm also using uprefix = attr to get all of the ignored properties but most
of those are useless.
Please advise...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-not-getting-OpenText-document-name-and-metadata-tp3997786.html
Sent from the Solr - User mailing list archive at Nabble.com.

80 matches

Mail list logo