Re: Reading timestamp for DIH

2010-12-04 Thread Koji Sekiguchi

(10/11/24 6:05), Siddharth Powar wrote:

Hey,

Is it possible to read the timestamp that the DataImportHandler uses for a
delta-import from a location other than conf/dataimport.properties.

Thanks,
Sid


No. There is an open issue for this problem:

https://issues.apache.org/jira/browse/SOLR-1970

Koji
--
http://www.rondhuit.com/en/


Re: finding exact case insensitive matches on single and multiword values

2010-12-04 Thread PeterKerk

Geert-Jan and Erick, thanks!

What I tried first is making it work with string type, that works perfect
for all lowercase values!

What I do not understand is how and why I have to make the casing work at
the client, since the casing differs in the database. Right now in the
database I have values for city:
Den Haag
Den HAAG
den haag
den haag

using fq=city:(den\ haag) gives me 2 results.

So it seems to me that because of the string type this casing issue cannot
be resolved as long as I'm using this fieldtype?


Then to the solution of tweaking the fieldtype for me to work.
I have this right now:

fieldType name=myField class=solr.TextField sortMissingLast=true
omitNorms=true 
analyzer 
tokenizer class=solr.KeywordTokenizerFactory/ 
filter class=solr.LowerCaseFilterFactory/ 
/analyzer 
/fieldType 

But I find it difficult to test what the result of the filters are, and
since as Erick already mentioned, the result looks correct but really
isnt...
Is there some tool where I can add and remove the filters to quickly see
what the output will be? (without having to reload schema.xml and do
reimport?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2017851.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with DIH delta-import delete.

2010-12-04 Thread Koji Sekiguchi

(10/11/17 20:18), Matti Oinas wrote:

Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.

The schema looks like

  fields
 field name=uuid type=string indexed=true stored=true
required=true /
 field name=type type=int indexed=true stored=true
required=true /
 field name=blog_id type=int indexed=true stored=true /
 field name=entry_id type=int indexed=false stored=true /
 field name=content type=textgen indexed=true stored=true /
  /fields
  uniqueKeyuuid/uniqueKey


Relevant fields from database tables:

TABLE: blogs and entries both have

   Field: id
Type: int(11)
Null: NO
 Key: PRI
Default: NULL
   Extra: auto_increment

   Field: modified
Type: datetime
Null: YES
 Key:
Default: NULL
   Extra:

   Field: status
Type: tinyint(1) unsigned
Null: YES
 Key:
Default: NULL
   Extra:


?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../
document
entity name=blog
pk=id
query=SELECT id,description,1 as type FROM blogs 
WHERE status=2
deltaImportQuery=SELECT id,description,1 as 
type FROM blogs WHERE
status=2 AND id='${dataimporter.delta.id}'
deltaQuery=SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'lt; modified AND status=2
deletedPkQuery=SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'lt;= modified AND status=3
transformer=TemplateTransformer
field column=uuid name=uuid 
template=blog-${blog.id} /
field column=id name=blog_id /
field column=description name=content /
field column=type name=type /
/entity
entity name=entry
pk=id
query=SELECT f.id as id,f.content,f.blog_id,2 
as type FROM
entries f,blogs b WHERE f.blog_id=b.id AND b.status=2
deltaImportQuery=SELECT f.id as 
id,f.content,f.blog_id,2 as type
FROM entries f,blogs b WHERE f.blog_id=b.id AND
f.id='${dataimporter.delta.id}'
deltaQuery=SELECT f.id as id FROM entries f 
JOIN blogs b ON
b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified
AND b.status=2
deletedPkQuery=SELECT f.id as id FROM entries 
f JOIN blogs b ON
b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
lt; b.modified

transformer=HTMLStripTransformer,TemplateTransformer
field column=uuid name=uuid 
template=entry-${entry.id} /
field column=id name=entry_id /
field column=blog_id name=blog_id /
field column=content name=content stripHTML=true 
/
field column=type name=type /
/entity
/document
/dataConfig

Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.. The problem is that documents are still found from solr
index.

1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

2. delta-import =

str name=
Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
/str
str name=Committed2010-11-17 13:00:50/str
str name=Optimized2010-11-17 13:00:50/str

So solr says it has deleted documents and that index is also optimzed
and committed after the operation.

3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).



Hi Matti,

Can you see something like the following Completed DeletedRowKey for Entity
and then Deleting document: ID-1 in your solr log?

(sample messages from my Solr log)
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder 
collectDelta
INFO: Completed DeletedRowKey for Entity: product rows obtained : 2
  :
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: OVEN-2
  :

If you cannot find these messages, I think there is something incorrect
setting (but I couldn't find incorrect ones in your data-config.xml...).

Koji
--
http://www.rondhuit.com/en/


Re: finding exact case insensitive matches on single and multiword values

2010-12-04 Thread Ahmet Arslan
 Then to the solution of tweaking the fieldtype for me to
 work.
 I have this right now:
     
     fieldType name=myField
 class=solr.TextField sortMissingLast=true
 omitNorms=true 
     analyzer 
         tokenizer
 class=solr.KeywordTokenizerFactory/ 
         filter
 class=solr.LowerCaseFilterFactory/ 
     /analyzer 
     /fieldType 


Additionally you can add TrimFilterFactory to your analyzer chain. 

And instead of escaping white spaces you can use RawQParserPlugin.
fq={!raw f=city}den haag







RE: finding exact case insensitive matches on single and multiword values

2010-12-04 Thread Jonathan Rochkind
ALL solr queries are case-sensitive.  

The trick is in the analyzers.  If you downcase everything at index time before 
you put it in the index, and downcase all queries at query time too -- then you 
have case-insensitive query.   Not because the Solr search algorithms are case 
insensitive, but because you've normalized all values to be all lowercase at 
both index and query time, so things will match. 

You can only do this kind of normalization through analyzers on a Solr text 
field, not a Solr string field. It's what the Solr text type is for. 

This wiki page, and this question in particular, will be helpful to you:
http://wiki.apache.org/solr/SolrRelevancyCookbook#Relevancy_and_Case_Matching

From: PeterKerk [vettepa...@hotmail.com]
Sent: Saturday, December 04, 2010 6:24 AM
To: solr-user@lucene.apache.org
Subject: Re: finding exact case insensitive matches on single and multiword 
values

Geert-Jan and Erick, thanks!

What I tried first is making it work with string type, that works perfect
for all lowercase values!

What I do not understand is how and why I have to make the casing work at
the client, since the casing differs in the database. Right now in the
database I have values for city:
Den Haag
Den HAAG
den haag
den haag

using fq=city:(den\ haag) gives me 2 results.

So it seems to me that because of the string type this casing issue cannot
be resolved as long as I'm using this fieldtype?


Then to the solution of tweaking the fieldtype for me to work.
I have this right now:

fieldType name=myField class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

But I find it difficult to test what the result of the filters are, and
since as Erick already mentioned, the result looks correct but really
isnt...
Is there some tool where I can add and remove the filters to quickly see
what the output will be? (without having to reload schema.xml and do
reimport?
--
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2017851.html
Sent from the Solr - User mailing list archive at Nabble.com.


autocommit commented out -- what is the default?

2010-12-04 Thread Brian Whitman
Hi, if you comment out the block in solrconfig.xml

!--
   autoCommit
  maxDocs1/maxDocs
  maxTime60/maxTime
/autoCommit
--

Does this mean that (a) commits never happen automatically or (b) some
default autocommit is applied?


Re: autocommit commented out -- what is the default?

2010-12-04 Thread Yonik Seeley
On Sat, Dec 4, 2010 at 10:36 AM, Brian Whitman br...@echonest.com wrote:
 Hi, if you comment out the block in solrconfig.xml

 !--
   autoCommit
      maxDocs1/maxDocs
      maxTime60/maxTime
    /autoCommit
 --

 Does this mean that (a) commits never happen automatically or (b) some
 default autocommit is applied?

Commented out means they never happen automatically (i.e., no default).
In general commitWithin is a better strategy to use... bulk updates
can use a large value (or no value w/ explicit commit at end) for
better indexing performance, while other updates can use a smaller
value depending on how soon the update needs to be visible.

-Yonik
http://www.lucidimagination.com


Re: Batch Update Fields

2010-12-04 Thread Adam Estrada
Synonyms eh? I have a synonym list like the following so how do I identify
the synonyms on a specific field. The only place the field is used is as a
facet.

original field = country name

AF = AFGHANISTAN
AX = ÅLAND ISLANDS
AL = ALBANIA
DZ = ALGERIA
AS = AMERICAN SAMOA
AD = ANDORRA
AO = ANGOLA
AI = ANGUILLA
AQ = ANTARCTICA
AG = ANTIGUA AND BARBUDA
AR = ARGENTINA
AM = ARMENIA
AW = ARUBA
AU = AUSTRALIA
AT = AUSTRIA
etc...

Any advise on that would be great and very much appreciated!

Adam

On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.comwrote:

 That will certainly work. Another option, assuming the country codes are
 in their own field would be to put the transformations into a synonym file
 that was only used on that field. That way you'd get this without having
 to do the pre-process step of the raw data...

 That said, if you pre-processing is working for you it may  not be worth
 your while
 to worry about doing it differently

 Best
 Erick

 On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  First off...I know enough about Solr to be VERY dangerous so please bare
  with me ;-) I am indexing the geonames database which only provides
 country
  codes. I can facet the codes but to the end user who may not know all 249
  codes, it isn't really all that helpful. Therefore, I want to map the
 full
  country names to the country codes provided in the geonames db.
  http://download.geonames.org/export/dump/
 
  http://download.geonames.org/export/dump/I used a simple split
 function
  to
  chop the 850 meg txt file in to manageable csv's that I can import in to
  Solr. Now that all 7 million + documents are in there, I want to change
 the
  country codes to the actual country names. I would of liked to have done
 it
  in the index but finding and replacing the strings in the csv seems to be
  working fine. After that I can just reindex the entire thing.
 
  Adam
 
  On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Have you consider defining synonyms for your code -country
   conversion at index time (or query time for that matter)?
  
   We may have an XY problem here. Could you state the high-level
   problem you're trying to solve? Maybe there's a better solution...
  
   Best
   Erick
  
   On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
I wonder...I know that sed would work to find and replace the terms
 in
   all
of the csv files that I am indexing but would it work to find and
  replace
key terms in the index?
   
find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g'
 {}
  \;
   
That command would iterate through all the files in the data
 directory
   and
replace the country code with the full country name. I many just back
  up
the
directory and try it. I have it running on csv files right now and
 it's
working wonderfully. For those of you interested, I am indexing the
   entire
Geonames dataset
   http://download.geonames.org/export/dump/(allCountries.zip)
which gives me a pretty comprehensive world gazetteer. My next step
 is
gonna
be to display the results as KML to view over a google globe.
   
Thoughts?
   
Adam
   
On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 No, there's no equivalent to SQL update for all values in a column.
You'll
 have to reindex all the documents.

 On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  OK part 2 of my previous question...
 
  Is there a way to batch update field values based on a certain
criteria?
  For example, if thousands of documents have a field value of 'US'
  can
   I
  update all of them to 'United States' programmatically?
 
  Adam

   
  
 



Re: Batch Update Fields

2010-12-04 Thread Erick Erickson
When you define your fieldType at index time. My idea
was that you substitue these on the way in to your
index. You may need a specific field type just for your
country conversion Perhaps in a copyField if
you need both the code and full name

Best
Erick

On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 Synonyms eh? I have a synonym list like the following so how do I identify
 the synonyms on a specific field. The only place the field is used is as a
 facet.

 original field = country name

 AF = AFGHANISTAN
 AX = ÅLAND ISLANDS
 AL = ALBANIA
 DZ = ALGERIA
 AS = AMERICAN SAMOA
 AD = ANDORRA
 AO = ANGOLA
 AI = ANGUILLA
 AQ = ANTARCTICA
 AG = ANTIGUA AND BARBUDA
 AR = ARGENTINA
 AM = ARMENIA
 AW = ARUBA
 AU = AUSTRALIA
 AT = AUSTRIA
 etc...

 Any advise on that would be great and very much appreciated!

 Adam

 On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  That will certainly work. Another option, assuming the country codes are
  in their own field would be to put the transformations into a synonym
 file
  that was only used on that field. That way you'd get this without having
  to do the pre-process step of the raw data...
 
  That said, if you pre-processing is working for you it may  not be worth
  your while
  to worry about doing it differently
 
  Best
  Erick
 
  On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   First off...I know enough about Solr to be VERY dangerous so please
 bare
   with me ;-) I am indexing the geonames database which only provides
  country
   codes. I can facet the codes but to the end user who may not know all
 249
   codes, it isn't really all that helpful. Therefore, I want to map the
  full
   country names to the country codes provided in the geonames db.
   http://download.geonames.org/export/dump/
  
   http://download.geonames.org/export/dump/I used a simple split
  function
   to
   chop the 850 meg txt file in to manageable csv's that I can import in
 to
   Solr. Now that all 7 million + documents are in there, I want to change
  the
   country codes to the actual country names. I would of liked to have
 done
  it
   in the index but finding and replacing the strings in the csv seems to
 be
   working fine. After that I can just reindex the entire thing.
  
   Adam
  
   On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
Have you consider defining synonyms for your code -country
conversion at index time (or query time for that matter)?
   
We may have an XY problem here. Could you state the high-level
problem you're trying to solve? Maybe there's a better solution...
   
Best
Erick
   
On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
estrada.adam.gro...@gmail.com
 wrote:
   
 I wonder...I know that sed would work to find and replace the terms
  in
all
 of the csv files that I am indexing but would it work to find and
   replace
 key terms in the index?

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g'
  {}
   \;

 That command would iterate through all the files in the data
  directory
and
 replace the country code with the full country name. I many just
 back
   up
 the
 directory and try it. I have it running on csv files right now and
  it's
 working wonderfully. For those of you interested, I am indexing the
entire
 Geonames dataset
http://download.geonames.org/export/dump/(allCountries.zip)
 which gives me a pretty comprehensive world gazetteer. My next step
  is
 gonna
 be to display the results as KML to view over a google globe.

 Thoughts?

 Adam

 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
   erickerick...@gmail.com
 wrote:

  No, there's no equivalent to SQL update for all values in a
 column.
 You'll
  have to reindex all the documents.
 
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   OK part 2 of my previous question...
  
   Is there a way to batch update field values based on a certain
 criteria?
   For example, if thousands of documents have a field value of
 'US'
   can
I
   update all of them to 'United States' programmatically?
  
   Adam
 

   
  
 



Re: How to make a client in JSP which will take output from Solr Server

2010-12-04 Thread Anurag

Ok, I solved it by just opening the connection and then parsing the output
from xml to front page. Though It has some security isuue...

-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-a-client-in-JSP-which-will-take-output-from-Solr-Server-tp1519527p2019632.html
Sent from the Solr - User mailing list archive at Nabble.com.


FastVectorHighlighter ignoring fragmenter parameter . . .

2010-12-04 Thread CRB
Got the FVH to work in Solr 3.1 (or at least I presume I have given I 
can see multi-color highlighting in the output.)


But I am not able to get it to recognize the regex fragmenter. I get 
no change in output if I specify the fragmenter. In fact, I can even 
enter bogus names for the fragmenter and get no change in the output.


Grateful for any suggestions.

Settings and output below.

Christopher


*Query*

   http://localhost:8983/solr/10k-Fragments/select?
   q=content%3Aliquidity
   rows=100
   fl=id%2Ccontent
   qt=standard
   hl.fl=content
   hl.useFastVectorHighlighter=true
   hl=true
   hl.fragmentsBuilder=colored
   hl.fragmenter=regex

*Response* (Abbreviated)

   response
   -
   lst name=responseHeader
   int name=status0/int
   int name=QTime47/int
   -
   lst name=params
   str name=flid,content/str
   str name=hl.useFastVectorHighlightertrue/str
   str name=qcontent:liquidity/str
   str name=hl.fragmenterregex1text/str
   str name=hl.flcontent/str
   str name=hl.fragmentsBuildercolored/str
   str name=qtstandard/str
   str name=hltrue/str
   str name=rows100/str
   /lst
   /lst
   . . .
   lst name=highlighting
   -
   lst
   
name=10K/1997-12-31/1998-04-01/1stBergenBancorp/0001005016/ManagementsDiscussionAndAnalysisOfFinancialConditionAndResultsOfOperations/LiquidityAndCapitalResource/paragraph/1/mh1261
   -
   arr name=content
   -
   str
   #4504; b style=background:yellowLiquidity/b is a measure of a
   bank's ability to fund loans and withdrawals of deposits in a cost-ef
   /str
   /arr
   /lst
   . . .

*Field listing in schema.xml*

   field name=content type=text indexed=true stored=true
   termVectors=true termPositions=true termOffsets=true/

*Highlighter listing in solrconfig.xml*

   highlighting

   fragmenter name=gap
   class=org.apache.solr.highlight.GapFragmenter default=true
   lst name=defaults
   int name=hl.fragsize100/int
   /lst
   /fragmenter
   fragmenter name=regex
   class=org.apache.solr.highlight.RegexFragmenter 
   lst name=defaults
   int name=hl.fragsize70/int
   float name=hl.regex.slop0.5/float
   str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
   /lst
   /fragmenter

   formatter name=html
   class=org.apache.solr.highlight.HtmlFormatter default=true
   lst name=defaults
   str name=hl.simple.pre![CDATA[em]]/str
   str name=hl.simple.post![CDATA[/em]]/str
   /lst
   /formatter

   !-- Configure the standard encoder --
   encoder name=html class=org.apache.solr.highlight.HtmlEncoder
   default=true/

   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
   class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- multi-colored tag FragmentsBuilder --
   fragmentsBuilder name=colored
   class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
   default=true
   lst name=defaults
   str name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b style=background:deepskyblue]]/str
   str name=hl.tag.post![CDATA[/b]]/str
   /lst
   /fragmentsBuilder
   /highlighting



Solr Got Exceptions When schema.xml is Changed

2010-12-04 Thread Bing Li
Dear all,

I am a new user of Solr. Now I am just trying to try some basic samples.
Solr can be started correctly with Tomcat.

However, when putting a new schema.xml under SolrHome/conf and starting
Tomcat again, I got the following two exceptions.

The Solr cannot be started correctly unless using the initial schema.xml
from Solr.

Why cannot I change the schema.xml?

Thanks so much!
Bing

Dec 5, 2010 4:52:49 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1146)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

-

SEVERE: Could not start SOLR. Check solr/home property
org.apache.solr.common.SolrException: QueryElevationComponent requires the
schema to have a uniqueKeyFie
ld implemented using StrField
at
org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java
:157)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:508)
at org.apache.solr.core.SolrCore.init(SolrCore.java:588)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)

at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:37
2)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4405)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5037)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:570)
at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:891)
at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:683)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:466)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1267)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:308)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:89)
at
org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:328)
at
org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:308)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1043)
at
org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:738)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1035)
at
org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:289)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
at
org.apache.catalina.core.StandardService.startInternal(StandardService.java:442)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
at
org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:674)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
at org.apache.catalina.startup.Catalina.start(Catalina.java:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 

Re: Solr Got Exceptions When schema.xml is Changed

2010-12-04 Thread Peter Karich



QueryElevationComponent requires the
schema to have a uniqueKeyFie
ld implemented using StrField


you should use the type StrField ('string') for the field used in 
uniqueKeyField


RE: autocommit commented out -- what is the default?

2010-12-04 Thread Jonathan Rochkind
It means they never happen automatically, added documents wont' be committed 
until you send a commit to solr. 

Jonathan

From: Brian Whitman [br...@echonest.com]
Sent: Saturday, December 04, 2010 10:36 AM
To: solr-user@lucene.apache.org
Subject: autocommit commented out -- what is the default?

Hi, if you comment out the block in solrconfig.xml

!--
   autoCommit
  maxDocs1/maxDocs
  maxTime60/maxTime
/autoCommit
--

Does this mean that (a) commits never happen automatically or (b) some
default autocommit is applied?


Full text hit term highlighting

2010-12-04 Thread Rich Cariens
Anyone ever use Solr to present a view of a document with hit-terms
highlighted within?  Kind of like Google's cached http://bit.ly/hgudWqcopies?


Re: Full text hit term highlighting

2010-12-04 Thread Lance Norskog
Set the fragment length to 0. This means highlight the entire text
body. If, you have stored the text body.

Otherwise, you have to get the term vectors somehow and highlight the
text yourself.

I investigated this problem awhile back for PDFs. You can add a
starting page and an OR list of search terms to the URL that loads a
PDF into the in-browser version of the Adobe PDF reader. This allows
you to load the PDF at the first occurence of any of the search terms,
with the terms highlighted. The search button takes you to the next of
any of the terms.

On Sat, Dec 4, 2010 at 4:10 PM, Rich Cariens richcari...@gmail.com wrote:
 Anyone ever use Solr to present a view of a document with hit-terms
 highlighted within?  Kind of like Google's cached 
 http://bit.ly/hgudWqcopies?




-- 
Lance Norskog
goks...@gmail.com


Re: Question about Solr Fieldtypes, Chaining of Tokenizers

2010-12-04 Thread Grant Ingersoll
Could you expand on your example and show the output you want?  FWIW, you could 
simply write a token filter that does the same thing as the WhitespaceTokenizer.

-Grant

On Dec 3, 2010, at 1:14 PM, Matthew Hall wrote:

 Hey folks, I'm working with a fairly specific set of requirements for our 
 corpus that needs a somewhat tricky text type for both indexing and searching.
 
 The chain currently looks like this:
 
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.PatternReplaceFilterFactory
   pattern=(.*?)(\p{Punct}*)$
   replacement=$1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
 filter class=solr.SnowballPorterFilterFactory language=English 
 protected=protwords.txt/
 filter class=solr.PatternReplaceFilterFactory
   pattern=\p{Punct}
   replacement= /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 Now you will notice that I'm trying to add in a second tokenizer to this 
 chain at the very end, this is due to the final replacement of punctuation to 
 whitespace.  At that point I'd like to further break up these tokens to 
 smaller tokens.
 
 The reason for this is that we have a mixed normal english word and 
 scientific corpus.  For example you could expect string like The symposium 
 of TgThe(RX3fg+and) gene studies being added to the index, and parts of 
 those phrases being searched on.
 
 We want to be able to remove the stopwords in the mostly english parts of 
 these types of statements, which the whitespace tokenizer, followed by 
 removing trailing punctuation,  followed by the stopfilter takes care of.  We 
 do not want to remove references to genetic information contained in allele 
 symbols and the like.
 
 Sadly as far as I can tell, you cannot chain tokenizers in the schema.xml, so 
 does anyone have some suggestions on how this could be accomplished?
 
 Oh, and let me add that the WordDelimiterFilter comes really close to what I 
 want, but since we are unwilling to promote our solr version to the trunk (we 
 are on the 1.4x) version atm, the inability to turn off the automatic phrase 
 queries makes it a no go.  We need to be able to make searches on 
 left/right match right/left.
 
 My searches through the old material on this subject isn't really showing me 
 much except some advice on using the copyField attribute.  But my 
 understanding is that this will simply take your original input to the field, 
 and then analyze it in two different ways depending on the field definitions. 
  It would be very nice if it were copying the already analyzed version of the 
 text... but that's not what its doing, right?
 
 Thanks for any advice on this matter.
 
 Matt
 
 

--
Grant Ingersoll
http://www.lucidimagination.com



Re: Question about Solr Fieldtypes, Chaining of Tokenizers

2010-12-04 Thread Robert Muir
On Fri, Dec 3, 2010 at 1:14 PM, Matthew Hall mh...@informatics.jax.org wrote:
 Oh, and let me add that the WordDelimiterFilter comes really close to what I
 want, but since we are unwilling to promote our solr version to the trunk
 (we are on the 1.4x) version atm, the inability to turn off the automatic
 phrase queries makes it a no go.  We need to be able to make searches on
 left/right match right/left.


if this is the case, it doesnt matter what your analysis does, it won't work.

your only workaround if you cannot upgrade, is to use PositionFilter
at query-time... but then you cannot use phrasequeries at all.


Re: Exceptions in Embedded Solr

2010-12-04 Thread Tharindu Mathew
Any help on this?

On Thu, Dec 2, 2010 at 7:51 PM, Tharindu Mathew mcclou...@gmail.com wrote:
 Hi everyone,

 I get the exception below when using Embedded Solr suddenly. If I
 delete the Solr index it goes back to normal, but it obviously has to
 start indexing from scratch. Any idea what the cause of this is?

 java.lang.RuntimeException: java.io.FileNotFoundException:
 /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2
 (No such file or directory)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
 at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
 at 
 org.wso2.carbon.registry.indexing.solr.SolrClient.init(SolrClient.java:103)
 at 
 org.wso2.carbon.registry.indexing.solr.SolrClient.getInstance(SolrClient.java:115)
 ... 44 more
 Caused by: java.io.FileNotFoundException:
 /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2
 (No such file or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
 at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78)
 at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108)
 at 
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94)
 at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70)
 at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:236)
 at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:72)
 at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
 at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
 at 
 org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
 ... 48 more

 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} -
 REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@58f24b6
 (null) has a reference count of 1
 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} -
 REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@654dbbf6
 (null) has a reference count of 1
 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} -
 CoreContainer was not shutdown prior to finalize(), indicates a bug --
 POSSIBLE RESOURCE LEAK!!!
 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} -
 CoreContainer was not shutdown prior to finalize(), indicates a bug --
 POSSIBLE RESOURCE LEAK!!!

 --
 Regards,

 Tharindu



 --
 Regards,

 Tharindu




-- 
Regards,

Tharindu


Re: How to make a client in JSP which will take output from Solr Server

2010-12-04 Thread Gora Mohanty
On Sun, Dec 5, 2010 at 1:51 AM, Anurag anurag.it.jo...@gmail.com wrote:

 Ok, I solved it by just opening the connection and then parsing the output
 from xml to front page. Though It has some security isuue...

See AJAX Solr: http://evolvingweb.github.com/ajax-solr/

Regards,
Gora