Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Sami Siren
I opened SOLR-3789. As a workaround you can remove str
name=compressioninternal/str from the config and it should work.

--
 Sami Siren

On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote:
 Hello,
 I have a very simple setup one master and one slave configured
 as below, but replication keeps failing with stacktrace as shown
 below. Note that 3.6 works fine on the same machines so I am thinking
 that Iam missing something in configuration with regards to solr
 4.0...can somebody kindly let me know if Iam missing something ? I am
 running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
 problem with SOLR on glassfish, this is the first time Iam using it on
 Tomcat

 On Master

 requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
   str name=replicateAftercommit/str
   str name=replicateAfteroptimize/str
   str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str
   str name=commitReserveDuration00:00:10/str
   /lst
 /requestHandler

 On Slave

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
 str 
 name=masterUrlhttp://testslave:8080/solr/mycore/replication/str

 str name=pollInterval00:00:50/str
 str name=compressioninternal/str
 str name=httpConnTimeout5000/str
 str name=httpReadTimeout1/str
  /lst
 /requestHandler


 Error

 22:44:10WARNING SnapPuller  Error in fetching packets

 java.util.zip.ZipException: unknown compression method
 at 
 java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
 at 
 org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
 at 
 org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
 at 
 org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
 at 
 org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 22:44:10SEVERE  ReplicationHandler  SnapPull failed
 :org.apache.solr.common.SolrException: Unable to download
 _3_Lucene40_0.tip completely. Downloaded 0!=170 at
 org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
 at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread veena rani
cores adminPath=/admin/cores
core name=core0 instanceDir=core0 /
core name=core1 instanceDir=core1 /
  /cores
try the above code snippet , in solr.xml.
But it works on Tomcat.

On Wed, Sep 5, 2012 at 1:10 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 :   core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test /

 I'm pretty sure what you hav above tells solr that core MYCORE_test it
 should use the instanceDir MYCORE but ignore the dataDir/ in that
 solrconfig.xml and use the one you specified.

 This on the other hand...

 :core name=MYCORE_test instanceDir=MYCORE
 :  property name=dataDir value=MYCORE_test /
 : /core

 ...tells solr that the MYCORE_test SolrCore should use the instanceDir
 MYCORE, and when parsing that solrconfig.xml file it should set the
 variable ${dataDir} to be MYCORE_test -- but if your solconfig.xml file
 does not ever refer to the  ${dataDir} variable, it would have any effect.

 so the question becomes -- what does your solrconfig.xml look like?


 -Hoss




-- 
Regards,
Veena.
Banglore.


Re: Sorting on mutivalued fields still impossible?

2012-09-05 Thread Toke Eskildsen
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
 Imagine you have two entries, aardvark and emu in your
 multiValued field. How should that document sort relative to
 another doc with camel and zebra? Any heuristic
 you apply will be wrong for someone else

I see two obvious choices here:

1) Sort by the value that is ordered first by the comparator function.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
This is what Uwe wants to do and it is normally done by preprocessing
and collapsing to a single value.
It could be implemented with an ordered multi-valued field cache by
comparing on the first (or last, in the case of reverse sort) entry for
each matching document.

2) Make duplicate entries in the result set, one for each value.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
Doc1: (aardvark), emu
Doc2: (camel), zebra
I have a hard time coming up with a real world use case for this.
It could be implemented by using a multi-valued field cache as above and
putting the same document ID into the sliding window sorter once for
each field value.

Collapsing this into a single algorithm:
Step through all IDs. For each ID, give access to the list of field
values and provide a callback for adding one or more (value, ID)-pairs
to the sliding windows sorter. 


Are there some other realistic heuristics that I have missed?



Re: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread bsargurunathan
Hi Rafal,

I worked with standalone zookeeper, which is starting.
But the next step is, I want to configure the zookeeper with my solr cloud
using Apache Tomcat.
How it is really possible? Can you please tell me the steps, which I have to
follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance..

Thanks,
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread Markus Jelsma
Set the -DzkHost= property in some Tomcat configuration as per the wiki page 
and point it to the Zookeeper(s). On Debian systems you can use 
/etc/default/tomcat6 to configure your properties.

 
 
-Original message-
 From:bsargurunathan bsargurunat...@gmail.com
 Sent: Wed 05-Sep-2012 10:40
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Cloud Implementation with Apache Tomcat
 
 Hi Rafal,
 
 I worked with standalone zookeeper, which is starting.
 But the next step is, I want to configure the zookeeper with my solr cloud
 using Apache Tomcat.
 How it is really possible? Can you please tell me the steps, which I have to
 follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance..
 
 Thanks,
 Guru
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread Ahmet Arslan
Hi,

You are trying to use two different approaches at the same time.

1) Remove 

arr name=last-components
  strsuggest/str
  strquery/str
/arr

from your requestHandler.

2) Execute this query URL : suggest/?q=michael bdf=titledefType=lucene

And you will see my point.

--- On Wed, 9/5/12, aniljayanti anil.jaya...@gmail.com wrote:

 From: aniljayanti anil.jaya...@gmail.com
 Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 5, 2012, 7:29 AM
 Hi,
 
 thanks,
 
 I m sending my whole configurations in schema and
 solrconfig.xml files.
 
 
 schema.xml
 ---
 
 fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
 omitNorms=true
     analyzer type=index
           tokenizer
 class=solr.KeywordTokenizerFactory /
           filter
 class=solr.LowerCaseFilterFactory /
           filter
 class=solr.PatternReplaceFilterFactory pattern=\s+
 replacement=  replace=all/
           filter
 class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=15 side=front /
     /analyzer
     analyzer type=query
          tokenizer
 class=solr.KeywordTokenizerFactory /
          filter
 class=solr.LowerCaseFilterFactory /
          filter
 class=solr.PatternReplaceFilterFactory pattern=\s+
 replacement=  replace=all/
     /analyzer
   /fieldType
 
 
 field name=title        
 type=edgytext     indexed=true
     stored=true /
 field name=empname     type=edgytext
     indexed=true    
 stored=true /
 
 field name=autocomplete_text type=edgytext
 indexed=true
 stored=false  multiValued=true omitNorms=true
 omitTermFreqAndPositions=false /
 
 copyField source=title dest=autocomplete_text/ 
 copyField source=empname
 dest=autocomplete_text/
 *
 solrconfig.xml
 -
 searchComponent class=solr.SpellCheckComponent
 name=suggest
     lst name=spellchecker
       str
 name=namesuggest/str
       str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
       str
 name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str 
     
       str
 name=storeDirsuggest/str
       str
 name=fieldautocomplete_text/str
       bool
 name=exactMatchFirsttrue/bool
       float
 name=threshold0.005/float
       str
 name=buildOnCommittrue/str
       str
 name=buildOnOptimizetrue/str
     /lst
    lst name=spellchecker
       str
 name=namejarowinkler/str 
       str
 name=fieldlowerfilt/str 
       str
 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
 
       str
 name=spellcheckIndexDirspellchecker/str 
    /lst
      str
 name=queryAnalyzerFieldTypeedgytext/str 
   /searchComponent
   
   requestHandler
 class=org.apache.solr.handler.component.SearchHandler
 name=/suggest startup=lazy
 
     lst name=defaults
       str
 name=spellchecktrue/str
       str
 name=spellcheck.dictionarysuggest/str
       str
 name=spellcheck.onlyMorePopulartrue/str
       str
 name=spellcheck.count5/str
       str
 name=spellcheck.collatefalse/str
       str
 name=spellcheck.maxCollations5/str
       str
 name=spellcheck.maxCollationTries1000/str
       str
 name=spellcheck.collateExtendedResultstrue/str
     /lst
     arr name=last-components
       strsuggest/str
       strquery/str
     /arr
   /requestHandler
 
 URL : suggest/?q=michael b
 -
 Response : 
 
 ?xml version=1.0 encoding=UTF-8 ? 
  response
  lst name=responseHeader
   int name=status0/int 
   int name=QTime3/int 
   /lst
   result name=response numFound=0 start=0
 / 
  lst name=spellcheck
  lst name=suggestions
  lst name=michael
   int name=numFound10/int 
   int name=startOffset1/int 
   int name=endOffset8/int 
   arr name=suggestion
   strmichael bully herbig/str 
   strmichael bolton/str 
   strmichael bolton: arias/str 
   strmichael falch/str 
   strmichael holm/str 
   strmichael jackson/str 
   strmichael neale/str 
   strmichael penn/str 
   strmichael salgado/str 
   strmichael w. smith/str 
   /arr
   /lst
  lst name=b
   int name=numFound10/int 
   int name=startOffset9/int 
   int name=endOffset10/int 
   arr name=suggestion
   strb in the mix - the remixes/str 
   strb2k/str 
   strbackstreet boys/str 
   strbackyard babies/str 
   strbanda maguey/str 
   strbarbra streisand/str 
   strbarry manilow/str 
   strbenny goodman/str 
   strbeny more/str 
   strbeyonce/str 
   /arr
   /lst
   str name=collationmichael bully herbig b
 in the mix - the
 remixes/str 
   /lst
   /lst
   /response
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005490.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.



RE: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread bsargurunathan
Hi Markus,

Can you please tell me the exact file name in the tomcat folder?
Means where I have to set the properties?
I am using Windows machine and I have the Tomcat6.


Thanks,
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005535.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication lag after cache optimizations

2012-09-05 Thread Damien Dudognon
Thanks for all the information.

 I'm not sure how exactly you are measuring/defining replication lag but 
 if you mean lag in how long until the newly replicated documents are 
 visible in searches

That is exactly what I wanted to say.

I've attached the cache statistics.

If you are interested in, a few more details on our use case :
Actually, we have only few hits on Solr (about 2 req/s) but we will quickly 
have more than 50 req/s. The requests are mainly facet requests. The index 
counts about 1,5M documents and we plan a size of 15M documents in one year.

Best regards,
Damien


CACHE

name:queryResultCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$3@3d762027)  
stats:  lookups : 4 
hits : 4 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1024 
warmupTime : 20 
cumulative_lookups : 1003454 
cumulative_hits : 894365 
cumulative_hitratio : 0.89 
cumulative_inserts : 120343 
cumulative_evictions : 0 

name:fieldCache  
class:   org.apache.solr.search.SolrFieldCacheMBean  
insanity_count : 0 
name:documentCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=null)  
stats:  lookups : 80 
hits : 60 
hitratio : 0.75 
inserts : 20 
evictions : 0 
size : 20 
warmupTime : 0 
cumulative_lookups : 10844723 
cumulative_hits : 8318341 
cumulative_hitratio : 0.76 
cumulative_inserts : 2526382 
cumulative_evictions : 0 

name:fieldValueCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=16384, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$1@38bdc9b3)  
stats:  lookups : 2 
hits : 2 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1 
warmupTime : 1369 
cumulative_lookups : 485281 
cumulative_hits : 485276 
cumulative_hitratio : 0.99 
cumulative_inserts : 2 
cumulative_evictions : 0 
item_tags : 
{field=tags,memSize=5804302,tindexSize=36148,time=1369,phase1=1357,nTerms=118241,bigTerms=0,termInstances=448772,uses=2}
 

name:filterCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$2@340523df)  
stats:  lookups : 21 
hits : 21 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1024 
warmupTime : 1305 
cumulative_lookups : 5956615 
cumulative_hits : 5868136 
cumulative_hitratio : 0.98 
cumulative_inserts : 88479 
cumulative_evictions : 0

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread aniljayanti
HI,

Thanks,

i want to search with title and empname both. for example when we use any
search engine like google,yahoo... we donot specify any type that is (name
or title or song...). Here (*suggest/?q=michael
bdf=titledefType=lucene*) we are specifying the title type search. 

I removed said configurations in solrconfig.xml file, got result like below.

lst name=spellcheck
  lst name=suggestions
  lst name=michael
  int name=numFound10/int 
  int name=startOffset1/int 
  int name=endOffset8/int 
  arr name=suggestion
  strmichael/str 
  strmichael/str 
  strmichael /str 
  strmichael j/str 
  strmichael ja/str 
  strmichael jac/str 
  strmichael jack/str 
  strmichael jacks/str 
  strmichael jackso/str 
  strmichael jackson/str 
  /arr
  /lst
  lst name=b
  int name=numFound10/int 
  int name=startOffset9/int 
  int name=endOffset10/int 
  arr name=suggestion
  strb/str 
  strb/str 
  strba/str 
  strbab/str 
  strbar/str 
  strbarb/str 
  strbe/str 
  strben/str 
  strbi/str 
  strbl/str 
  /arr
  /lst
  str name=collationmichael b/str 
  /lst
  /lst

I sent my schema and solrconfig xml file configurations. Please check.

Aniljayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Cloud partitioning

2012-09-05 Thread dan sutton
Hi,

At the moment, partitioning with solrcloud is hash based on uniqueid.
What I'd like to do is have custom partitioning, e.g. based on date
(shard_MMYY).

I'm aware of https://issues.apache.org/jira/browse/SOLR-2592, but
after a cursory look it seems that with the latest patch, one might
end up with multiple partitions in the same shard, perhaps all (e.g.
if 2 or more partition hash values end up in the same range), which
I'd not want.

Has anyone else implemented custom shard partitioning for solrcloud ?

I think the answer is to have the partition class itself pluggable
(default to hash of unique_key as now), but not sure how to pass the
solrConfig pluggable partition class through to ClusterState (which is
in solrj not core)? any advice?

Cheers,
Dan


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread Paul
I don't think I changed by solrconfig.xml file from the default that
was provided in the example folder for solr 4.0.

On Tue, Sep 4, 2012 at 3:40 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 :   core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test /

 I'm pretty sure what you hav above tells solr that core MYCORE_test it
 should use the instanceDir MYCORE but ignore the dataDir/ in that
 solrconfig.xml and use the one you specified.

 This on the other hand...

 :core name=MYCORE_test instanceDir=MYCORE
 :  property name=dataDir value=MYCORE_test /
 : /core

 ...tells solr that the MYCORE_test SolrCore should use the instanceDir
 MYCORE, and when parsing that solrconfig.xml file it should set the
 variable ${dataDir} to be MYCORE_test -- but if your solconfig.xml file
 does not ever refer to the  ${dataDir} variable, it would have any effect.

 so the question becomes -- what does your solrconfig.xml look like?


 -Hoss


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread Ahmet Arslan
 i want to search with title and empname both. 

I know, I give that URL just to get the idea here.
If you try 
suggest/?q=michael bdf=titledefType=lucenefl=title
you will see that your interested will in results section not lst 
name=spellcheck section.

 or title or song...). Here (*suggest/?q=michael
 bdf=titledefType=lucene*) we are specifying the
 title type search. 

q=title:michael b OR empname:michael bfl=title,empname would the trick.


 I removed said configurations in solrconfig.xml file, got
 result like below.

If you removed it, then there shouldn't be spellcheck response. And you are 
still looking results in the wrong place.


Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
statement used to work, but now it doesn't seem to be deleting. I've
been experimenting around, and it seems like this should be the URL
for deleting the document with the uri of network_24.

In a browser, I first go here:

http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true

I get this response:

response
  lst name=responseHeader
int name=status0/int
int name=QTime5/int
  /lst
/response

And this is in the log file:

(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
(timestamp) org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@646dd60e main
(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
S(timestamp) org.apache.solr.core.SolrCore registerSearcher
INFO: [MYCORE] Registered new searcher Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [MYCORE] webapp=/solr path=/update
params={commit=truestream.body=deletequeryuri:network_24/query/delete}
{deleteByQuery=uri:network_24,commit=} 0 5

But if I then go to this URL:

http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml

I get this response:

response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
  str name=wtxml/str
  str name=quri:network_24/str
/lst
  /lst
  result name=response numFound=1 start=0
doc
  str name=namenetwork24/str
  str name=urinetwork_24/str
/doc
  /result
/response

Why didn't that document disappear?


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Jack Krupansky
Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery 
silently ignored if updateLog is enabled, but {{_version_}} field does not 
exist in schema.


See:
https://issues.apache.org/jira/browse/SOLR-3432

-- Jack Krupansky

-Original Message- 
From: Paul

Sent: Wednesday, September 05, 2012 10:05 AM
To: solr-user
Subject: Still see document after delete with commit in solr 4.0

I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
statement used to work, but now it doesn't seem to be deleting. I've
been experimenting around, and it seems like this should be the URL
for deleting the document with the uri of network_24.

In a browser, I first go here:

http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true

I get this response:

response
 lst name=responseHeader
   int name=status0/int
   int name=QTime5/int
 /lst
/response

And this is in the log file:

(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}

(timestamp) org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@646dd60e main
(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
S(timestamp) org.apache.solr.core.SolrCore registerSearcher
INFO: [MYCORE] Registered new searcher Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [MYCORE] webapp=/solr path=/update
params={commit=truestream.body=deletequeryuri:network_24/query/delete}
{deleteByQuery=uri:network_24,commit=} 0 5

But if I then go to this URL:

http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml

I get this response:

response
 lst name=responseHeader
   int name=status0/int
   int name=QTime1/int
   lst name=params
 str name=wtxml/str
 str name=quri:network_24/str
   /lst
 /lst
 result name=response numFound=1 start=0
   doc
 str name=namenetwork24/str
 str name=urinetwork_24/str
   /doc
 /result
/response

Why didn't that document disappear? 



RE: exception in highlighter when using phrase search

2012-09-05 Thread Yoni Amir
I think I found the cause for this. It is partially my fault, because I sent 
solr a field with empty value, but this is also a configuration problem.

https://issues.apache.org/jira/browse/SOLR-3792


-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Tuesday, September 04, 2012 3:53 PM
To: solr-user@lucene.apache.org
Subject: exception in highlighter when using phrase search

I got this problem with solr 4 beta and the highlighting component.

When I search for a phrase, such as foo bar, everything works ok.
When I add highlighting, I get this exception below.
You can see according to the first log line that I am searching only one field  
(all_text), but what is not visible in the log is that I am highlighting on all 
fields in the document, with hl.requireFieldMatch=false and hl.fl=*.

INFO  (SolrCore.java:1670) - [rcmCore] webapp=/solr path=/select 
params={fq={!edismax}module:Alerts+and+bu:abcd+Region1qf=attachmentqf=all_textversion=2rows=20wt=javabinstart=0q=foo
 bar} hits=103 status=500 QTime=38 ERROR (SolrException.java:104) - 
null:java.lang.NullPointerException
   at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:191)
   at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:152)
   at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:209)
   at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
   at 
org.apache.lucene.analysis.miscellaneous.RemoveDuplicatesTokenFilter.incrementToken(RemoveDuplicatesTokenFilter.java:54)
   at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
   at 
org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:629)
   at 
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:78)
   at 
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:50)
   at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:225)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
   at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
   at java.lang.Thread.run(Thread.java:736)

Any idea?

Thanks,
Yoni


Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
compression node in the solrconfig.xml and let you know how it went.

Thanks,

Ravi Kiran Bhaskar

On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren ssi...@gmail.com wrote:
 I opened SOLR-3789. As a workaround you can remove str
 name=compressioninternal/str from the config and it should work.

 --
  Sami Siren

 On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote:
 Hello,
 I have a very simple setup one master and one slave configured
 as below, but replication keeps failing with stacktrace as shown
 below. Note that 3.6 works fine on the same machines so I am thinking
 that Iam missing something in configuration with regards to solr
 4.0...can somebody kindly let me know if Iam missing something ? I am
 running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
 problem with SOLR on glassfish, this is the first time Iam using it on
 Tomcat

 On Master

 requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
   str name=replicateAftercommit/str
   str name=replicateAfteroptimize/str
   str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str
   str name=commitReserveDuration00:00:10/str
   /lst
 /requestHandler

 On Slave

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
 str 
 name=masterUrlhttp://testslave:8080/solr/mycore/replication/str

 str name=pollInterval00:00:50/str
 str name=compressioninternal/str
 str name=httpConnTimeout5000/str
 str name=httpReadTimeout1/str
  /lst
 /requestHandler


 Error

 22:44:10WARNING SnapPuller  Error in fetching packets

 java.util.zip.ZipException: unknown compression method
 at 
 java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
 at 
 org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
 at 
 org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
 at 
 org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
 at 
 org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 22:44:10SEVERE  ReplicationHandler  SnapPull failed
 :org.apache.solr.common.SolrException: Unable to download
 _3_Lucene40_0.tip completely. Downloaded 0!=170 at
 org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
 at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 

Website (crawler for) indexing

2012-09-05 Thread Lochschmied, Alexander
This may be a bit off topic: How do you index an existing website and control 
the data going into index?

We already have Java code to process the HTML (or XHTML) and turn it into a 
SolrJ Document (removing tags and other things we do not want in the index). We 
use SolrJ for indexing.
So I guess the question is essentially which Java crawler could be useful.

We used to use wget on command line in our publishing process, but we do no 
longer want to do that.

Thanks,
Alexander



RE: Website (crawler for) indexing

2012-09-05 Thread Markus Jelsma
Please take a look at the Apache Nutch project.  
http://nutch.apache.org/
 
-Original message-
 From:Lochschmied, Alexander alexander.lochschm...@vishay.com
 Sent: Wed 05-Sep-2012 17:09
 To: solr-user@lucene.apache.org
 Subject: Website (crawler for) indexing
 
 This may be a bit off topic: How do you index an existing website and control 
 the data going into index?
 
 We already have Java code to process the HTML (or XHTML) and turn it into a 
 SolrJ Document (removing tags and other things we do not want in the index). 
 We use SolrJ for indexing.
 So I guess the question is essentially which Java crawler could be useful.
 
 We used to use wget on command line in our publishing process, but we do no 
 longer want to do that.
 
 Thanks,
 Alexander
 
 


Re: Website (crawler for) indexing

2012-09-05 Thread Rafał Kuć
Hello!

You can implement your own crawler using Droids
(http://incubator.apache.org/droids/) or use Apache Nutch
(http://nutch.apache.org/), which is very easy to integrate with Solr
and is very powerful crawler.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 This may be a bit off topic: How do you index an existing website
 and control the data going into index?

 We already have Java code to process the HTML (or XHTML) and turn
 it into a SolrJ Document (removing tags and other things we do not
 want in the index). We use SolrJ for indexing.
 So I guess the question is essentially which Java crawler could be useful.

 We used to use wget on command line in our publishing process, but we do no 
 longer want to do that.

 Thanks,
 Alexander



Re: Delete all documents in the index

2012-09-05 Thread Jack Krupansky
Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery 
silently ignored if updateLog is enabled, but {{_version_}} field does not 
exist in schema.


See:
https://issues.apache.org/jira/browse/SOLR-3432

This could happen if you kept the new 4.0 solrconfig.xml, but copied in your 
pre-4.0 schema.xml.


-- Jack Krupansky

-Original Message- 
From: Rohit Harchandani

Sent: Wednesday, September 05, 2012 12:48 PM
To: solr-user@lucene.apache.org
Subject: Delete all documents in the index

Hi,
I am having difficulty deleting documents from the index using curl. The
urls i tried were:
curl http://localhost:9020/solr/core1/update/?stream.body=
deletequery*:*/query/deletecommit=true
curl http://localhost:9020/solr/core1/update/?commit=true; -H
Content-Type: text/xml --data-binary 'deletequeryid:[* TO
*]/query/delete'
curl http://localhost:9020/solr/core1/update/?commit=true; -H
Content-Type: text/xml --data-binary 'deletequery*:*/query/delete'
I also tried:
curl 
http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true

as suggested on some forums. I get a response with status=0 in all cases,
but none of the above seem to work.
When I run
curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml;
I still get a value for numFound.

I am currently using solr 4.0 beta version.

Thanks for your help in advance.
Regards,
Rohit 



Re: Delete all documents in the index

2012-09-05 Thread Michael Della Bitta
Rohit:

If it's easy, the easiest thing to do is to turn off your servlet
container, rm -r * inside of the data directory, and then restart the
container.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com wrote:
 Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery
 silently ignored if updateLog is enabled, but {{_version_}} field does not
 exist in schema.

 See:
 https://issues.apache.org/jira/browse/SOLR-3432

 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your
 pre-4.0 schema.xml.

 -- Jack Krupansky

 -Original Message- From: Rohit Harchandani
 Sent: Wednesday, September 05, 2012 12:48 PM
 To: solr-user@lucene.apache.org
 Subject: Delete all documents in the index


 Hi,
 I am having difficulty deleting documents from the index using curl. The
 urls i tried were:
 curl http://localhost:9020/solr/core1/update/?stream.body=
 deletequery*:*/query/deletecommit=true
 curl http://localhost:9020/solr/core1/update/?commit=true; -H
 Content-Type: text/xml --data-binary 'deletequeryid:[* TO
 *]/query/delete'
 curl http://localhost:9020/solr/core1/update/?commit=true; -H
 Content-Type: text/xml --data-binary 'deletequery*:*/query/delete'
 I also tried:
 curl 
 http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true
 
 as suggested on some forums. I get a response with status=0 in all cases,
 but none of the above seem to work.
 When I run
 curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml;
 I still get a value for numFound.

 I am currently using solr 4.0 beta version.

 Thanks for your help in advance.
 Regards,
 Rohit


Solr index on Amazon S3

2012-09-05 Thread Nicolas de Saint-Aubert
Hi,

We currently share a single solr read index on an nfs accessed by
various solr instances from various devices which gives us a high
performant cluster framework. We would like to migrate to Amazon or
other cloud. Is there any way (compatibility) to have solr index on
Amazon S3 file cloud system, so that we could access a single index
form various solr as we currently do ?

Thanks for helping !


EdgeNgramTokenFilter and positions

2012-09-05 Thread Walter Underwood
In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at 
sequential positions. This seems wrong, because an n-gram is associated with a 
source token at a specific position. It also really messes up phrase matches.

With the source text fleen, these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.commailto:wun...@chegg.com





Re: Delete all documents in the index

2012-09-05 Thread Rohit Harchandani
Thanks everyone. Adding the _version_ field in the schema worked.
Deleting the data directory works for me, but was not sure why deleting
using curl was not working.

On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Rohit:

 If it's easy, the easiest thing to do is to turn off your servlet
 container, rm -r * inside of the data directory, and then restart the
 container.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com
 wrote:
  Check to make sure that you are not stumbling into SOLR-3432:
 deleteByQuery
  silently ignored if updateLog is enabled, but {{_version_}} field does
 not
  exist in schema.
 
  See:
  https://issues.apache.org/jira/browse/SOLR-3432
 
  This could happen if you kept the new 4.0 solrconfig.xml, but copied in
 your
  pre-4.0 schema.xml.
 
  -- Jack Krupansky
 
  -Original Message- From: Rohit Harchandani
  Sent: Wednesday, September 05, 2012 12:48 PM
  To: solr-user@lucene.apache.org
  Subject: Delete all documents in the index
 
 
  Hi,
  I am having difficulty deleting documents from the index using curl. The
  urls i tried were:
  curl http://localhost:9020/solr/core1/update/?stream.body=
  deletequery*:*/query/deletecommit=true
  curl http://localhost:9020/solr/core1/update/?commit=true; -H
  Content-Type: text/xml --data-binary 'deletequeryid:[* TO
  *]/query/delete'
  curl http://localhost:9020/solr/core1/update/?commit=true; -H
  Content-Type: text/xml --data-binary
 'deletequery*:*/query/delete'
  I also tried:
  curl 
 
 http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true
  
  as suggested on some forums. I get a response with status=0 in all cases,
  but none of the above seem to work.
  When I run
  curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml;
  I still get a value for numFound.
 
  I am currently using solr 4.0 beta version.
 
  Thanks for your help in advance.
  Regards,
  Rohit



Re: Solr index on Amazon S3

2012-09-05 Thread Michael Della Bitta
Amazon doesn't have a prebuilt network filesystem that's mountable on
multiple hosts out of the box. The closest thing would be setting up
NFS among your hosts yourself, but at that point it'd probably be
easier to set up Solr replication.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Sep 5, 2012 at 1:26 PM, Nicolas de Saint-Aubert
dsanico...@gmail.com wrote:
 Hi,

 We currently share a single solr read index on an nfs accessed by
 various solr instances from various devices which gives us a high
 performant cluster framework. We would like to migrate to Amazon or
 other cloud. Is there any way (compatibility) to have solr index on
 Amazon S3 file cloud system, so that we could access a single index
 form various solr as we currently do ?

 Thanks for helping !


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
That was exactly it. I added the following line to schema.xml and it now works.

field name=_version_ type=long indexed=true stored=true/


On Wed, Sep 5, 2012 at 10:13 AM, Jack Krupansky j...@basetechnology.com wrote:
 Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery
 silently ignored if updateLog is enabled, but {{_version_}} field does not
 exist in schema.

 See:
 https://issues.apache.org/jira/browse/SOLR-3432

 -- Jack Krupansky

 -Original Message- From: Paul
 Sent: Wednesday, September 05, 2012 10:05 AM
 To: solr-user
 Subject: Still see document after delete with commit in solr 4.0


 I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
 statement used to work, but now it doesn't seem to be deleting. I've
 been experimenting around, and it seems like this should be the URL
 for deleting the document with the uri of network_24.

 In a browser, I first go here:

 http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3Ecommit=true

 I get this response:

 response
  lst name=responseHeader
int name=status0/int
int name=QTime5/int
  /lst
 /response

 And this is in the log file:

 (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
 (timestamp) org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening Searcher@646dd60e main
 (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: end_commit_flush
 (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
 INFO: QuerySenderListener sending requests to Searcher@646dd60e
 main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
 (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
 INFO: QuerySenderListener done.
 S(timestamp) org.apache.solr.core.SolrCore registerSearcher
 INFO: [MYCORE] Registered new searcher Searcher@646dd60e
 main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
 (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: [MYCORE] webapp=/solr path=/update
 params={commit=truestream.body=deletequeryuri:network_24/query/delete}
 {deleteByQuery=uri:network_24,commit=} 0 5

 But if I then go to this URL:

 http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24wt=xml

 I get this response:

 response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
  str name=wtxml/str
  str name=quri:network_24/str
/lst
  /lst
  result name=response numFound=1 start=0
doc
  str name=namenetwork24/str
  str name=urinetwork_24/str
/doc
  /result
 /response

 Why didn't that document disappear?


Re: Solr index on Amazon S3

2012-09-05 Thread Erik Hatcher
Nicolas -

Can you elaborate on your use and configuration of Solr on NFS?What lock 
factory are you using?  (you had to change from the default, right?)

And how are you coordinating updates/commits to the other servers?   Where does 
indexing occur and then how are commits sent to the NFS mounted servers?

Thanks for sharing anything you can about this.

Erik

On Sep 5, 2012, at 13:26 , Nicolas de Saint-Aubert wrote:

 Hi,
 
 We currently share a single solr read index on an nfs accessed by
 various solr instances from various devices which gives us a high
 performant cluster framework. We would like to migrate to Amazon or
 other cloud. Is there any way (compatibility) to have solr index on
 Amazon S3 file cloud system, so that we could access a single index
 form various solr as we currently do ?
 
 Thanks for helping !



Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Chris Hostetter

: That was exactly it. I added the following line to schema.xml and it now 
works.
: 
: field name=_version_ type=long indexed=true stored=true/

Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5 
-- did you throw out your old solrconfig.xml and use the example 
solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact 
have an updateLog ... / in your solrconfig.xml?

(if so: then this is all known as part of SOLR-3432, and won't affect any 
users of 4.0-final -- but i want to be absolutely sure there isn't some 
other edge case of this bug)


-Hoss


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread Chris Hostetter

: I don't think I changed by solrconfig.xml file from the default that
: was provided in the example folder for solr 4.0.

ok ... well the Solr 4.0-BETA example solrconfig.xml has this in it...

  dataDir${solr.data.dir:}/dataDir

So if you want to override the dataDir using a property like your second 
example, it should be something like...

   core name=MYCORE_test instanceDir=MYCORE   
 property name=solr.data.dir value=MYCORE_test /
   /core

...the property name used in the solrconfig.xml has to match the property 
name you use when declaring the core, or it won't get used and you'll get 
the default behavior.  solr.data.dir isn't special here -- you could use 
any umber of properties in your solrconfig.xml, and declare them when 
defining your individual cores.

that's very differnet from your other example...

   core name=MYCORE_test instanceDir=MYCORE dataDir=MYCORE_test /

...which doesn't use properties at all, and says this is waht the dataDir 
should be, regardless of what the dataDir.../dataDir looks like in the 
solrconfig.xml


(at least: i'm pretty sure that's how it works)



-Hoss


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
Actually, I didn't technically upgrade. I downloaded the new
version, grabbed the example, and pasted in the fields from my schema
into the new one. So the only two files I changed from the example are
schema.xml and solr.xml.

Then I reindexed everything from scratch so there was no old index
involved, either.

On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : That was exactly it. I added the following line to schema.xml and it now 
 works.
 :
 : field name=_version_ type=long indexed=true stored=true/

 Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5
 -- did you throw out your old solrconfig.xml and use the example
 solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact
 have an updateLog ... / in your solrconfig.xml?

 (if so: then this is all known as part of SOLR-3432, and won't affect any
 users of 4.0-final -- but i want to be absolutely sure there isn't some
 other edge case of this bug)


 -Hoss


Re: EdgeNgramTokenFilter and positions

2012-09-05 Thread Jack Krupansky
I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6 
and 4.0-BETA in Solr admin analysis.


Interestingly, the screen shot for LUCENE-3642 does in fact show the 
(improperly) incremented positions for successive ngrams.


See:
https://issues.apache.org/jira/browse/LUCENE-3642

I'm surprised that nobody noticed the bogus positions back then.

Technically, this is a Lucene issue.

-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions

In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at 
sequential positions. This seems wrong, because an n-gram is associated with 
a source token at a specific position. It also really messes up phrase 
matches.


With the source text fleen, these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.commailto:wun...@chegg.com





Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Jack Krupansky
And when you pasted your 3.5 fields into the 4.0 schema, did you delete the 
existing fields (including _version_) at the same time?


-- Jack Krupansky

-Original Message- 
From: Paul

Sent: Wednesday, September 05, 2012 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Still see document after delete with commit in solr 4.0

Actually, I didn't technically upgrade. I downloaded the new
version, grabbed the example, and pasted in the fields from my schema
into the new one. So the only two files I changed from the example are
schema.xml and solr.xml.

Then I reindexed everything from scratch so there was no old index
involved, either.

On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:


: That was exactly it. I added the following line to schema.xml and it now 
works.

:
: field name=_version_ type=long indexed=true stored=true/

Just to be clear: how exactly did you upgraded to solr 4.0 from solr 3.5
-- did you throw out your old solrconfig.xml and use the example
solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact
have an updateLog ... / in your solrconfig.xml?

(if so: then this is all known as part of SOLR-3432, and won't affect any
users of 4.0-final -- but i want to be absolutely sure there isn't some
other edge case of this bug)


-Hoss 




Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Chris Hostetter

: Actually, I didn't technically upgrade. I downloaded the new
: version, grabbed the example, and pasted in the fields from my schema
: into the new one. So the only two files I changed from the example are
: schema.xml and solr.xml.

ok -- so with the fix for SOLR-3432, anyone who tries similar steps with 
4.0-final will get a clear error on startup -- that was my main concern.  
thanks for clarifying.


-Hoss


Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-05 Thread Chris Hostetter

: Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

Günter, This is definitely strange

The good news is, i can reproduce your problem. 
The bad news is, i can reproduce your problem - and i have no idea what's 
causing it.

I've opened SOLR-3793 to try to get to the bottom of this, and included 
some basic steps to demonstrate the bug using the Solr 4.0-BETA example 
data, but i'm really not sure what the problem might be...

https://issues.apache.org/jira/browse/SOLR-3793


-Hoss

Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
The replication finally worked after I removed the compression setting
from the solrconfig.xml on the slave. Thanks for providing the
workaround.

Ravi Kiran

On Wed, Sep 5, 2012 at 10:23 AM, Ravi Solr ravis...@gmail.com wrote:
 Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
 compression node in the solrconfig.xml and let you know how it went.

 Thanks,

 Ravi Kiran Bhaskar

 On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren ssi...@gmail.com wrote:
 I opened SOLR-3789. As a workaround you can remove str
 name=compressioninternal/str from the config and it should work.

 --
  Sami Siren

 On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr ravis...@gmail.com wrote:
 Hello,
 I have a very simple setup one master and one slave configured
 as below, but replication keeps failing with stacktrace as shown
 below. Note that 3.6 works fine on the same machines so I am thinking
 that Iam missing something in configuration with regards to solr
 4.0...can somebody kindly let me know if Iam missing something ? I am
 running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
 problem with SOLR on glassfish, this is the first time Iam using it on
 Tomcat

 On Master

 requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
   str name=replicateAftercommit/str
   str name=replicateAfteroptimize/str
   str name=confFilesschema.xml,stopwords.txt,synonyms.txt/str
   str name=commitReserveDuration00:00:10/str
   /lst
 /requestHandler

 On Slave

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
 str 
 name=masterUrlhttp://testslave:8080/solr/mycore/replication/str

 str name=pollInterval00:00:50/str
 str name=compressioninternal/str
 str name=httpConnTimeout5000/str
 str name=httpReadTimeout1/str
  /lst
 /requestHandler


 Error

 22:44:10WARNING SnapPuller  Error in fetching packets

 java.util.zip.ZipException: unknown compression method
 at 
 java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
 at 
 org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
 at 
 org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
 at 
 org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
 at 
 org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 22:44:10SEVERE  ReplicationHandler  SnapPull failed
 :org.apache.solr.common.SolrException: Unable to download
 _3_Lucene40_0.tip completely. Downloaded 0!=170 at
 org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
 at 
 org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
 at 
 org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
 

Duplicates in the suggester.

2012-09-05 Thread sharath jagannath
Not sure whether it is a duplicate question. Did try to browse through the
archive and did not find anything specific to what I was looking for.
I see duplicates in the dictionary if I update the document concurrently.

I am using Solr 3.6.1 with the following configurations for suggester:

Solr Config:
   searchComponent name=suggest class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetext_auto_suggest/str
lst name=spellchecker
str name=namesuggest/str
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldname_auto/str
str name=buildOnCommittrue/str
/lst
/searchComponent
requestHandler name=/suggest
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

Schema:
fieldType name=text_auto_suggest class=solr.TextField
omitNorms=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory /
!-- tokenizer class=solr.KeywordTokenizerFactory / --
!-- filter class=solr.LowerCaseFilterFactory /  --
filter class=solr.ClassicFilterFactory /
!-- filter class=solr.LengthFilterFactory min=2 / --
/analyzer

analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
filter class=solr.ClassicFilterFactory /
!-- filter class=solr.LengthFilterFactory min=2 / --
/analyzer
/fieldType


field name=name_auto type=text_auto_suggest indexed=true
stored=true multiValued=false /

Example text I would be indexing for suggester:
foo_bar %|4%|1%|food

%| - used as a combiner,
Part 1: foo_bar, Name of the entity
Part 2: number of activities(application specific) on the entity.
Part 3: id of the document.
Part 4: food, category of the entity.

As I mentioned earlier, I saw duplicates in the spellcheck index documents
when I updated the concurrently.

arr name=suggestion
strfoo_bar %|4%|1%|food/str
strfoo_bar %|1%|1%|food/str
strfoo_bar %|2%|1%|food/str
strfoo_bar %|3%|1%|food/str
/arr

I do not see duplicates when I update the documents sequentially. I have a
strong doubt this is happening because of the way I am combining multiple
fields using %|.
Would appreciate if somebody could suggest any suitable changes that would
help me with this issue.


-- 
Thanks,
Sharath


Re: Delete all documents in the index

2012-09-05 Thread Mark Mandel
Thanks for posting this!

I ran into exactly this issue yesterday, and ended up felting the files to
get around it.

Mark

Sent from my mobile doohickey.
On Sep 6, 2012 4:13 AM, Rohit Harchandani rhar...@gmail.com wrote:

 Thanks everyone. Adding the _version_ field in the schema worked.
 Deleting the data directory works for me, but was not sure why deleting
 using curl was not working.

 On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Rohit:
 
  If it's easy, the easiest thing to do is to turn off your servlet
  container, rm -r * inside of the data directory, and then restart the
  container.
 
  Michael Della Bitta
 
  
  Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
  www.appinions.com
  Where Influence Isn’t a Game
 
 
  On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com
 
  wrote:
   Check to make sure that you are not stumbling into SOLR-3432:
  deleteByQuery
   silently ignored if updateLog is enabled, but {{_version_}} field does
  not
   exist in schema.
  
   See:
   https://issues.apache.org/jira/browse/SOLR-3432
  
   This could happen if you kept the new 4.0 solrconfig.xml, but copied in
  your
   pre-4.0 schema.xml.
  
   -- Jack Krupansky
  
   -Original Message- From: Rohit Harchandani
   Sent: Wednesday, September 05, 2012 12:48 PM
   To: solr-user@lucene.apache.org
   Subject: Delete all documents in the index
  
  
   Hi,
   I am having difficulty deleting documents from the index using curl.
 The
   urls i tried were:
   curl http://localhost:9020/solr/core1/update/?stream.body=
   deletequery*:*/query/deletecommit=true
   curl http://localhost:9020/solr/core1/update/?commit=true; -H
   Content-Type: text/xml --data-binary 'deletequeryid:[* TO
   *]/query/delete'
   curl http://localhost:9020/solr/core1/update/?commit=true; -H
   Content-Type: text/xml --data-binary
  'deletequery*:*/query/delete'
   I also tried:
   curl 
  
 
 http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true
   
   as suggested on some forums. I get a response with status=0 in all
 cases,
   but none of the above seem to work.
   When I run
   curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml;
   I still get a value for numFound.
  
   I am currently using solr 4.0 beta version.
  
   Thanks for your help in advance.
   Regards,
   Rohit
 



Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to 
keep persistent HTTP connections:


$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep 
Keep-Alive

Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it 
would be better to ask about Solr example, since it is easier for anyone 
to reproduce the issue.


-- Aleksey


Re: Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona
Some extra information. If I use curl and force it to use HTTP 1.0, it 
is more visible that Solr doesn't allow persistent connections:


$ curl -v -0 'http://localhost:8983/solr/select?q=*:*' -H'Connection: 
Keep-Alive'* About to connect() to localhost port 8983 (#0)

*   Trying ::1... connected
 GET /solr/select?q=*:* HTTP/1.0
 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 
OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3

 Host: localhost:8983
 Accept: */*
 Connection: Keep-Alive

 HTTP/1.1 200 OK
 Content-Type: application/xml; charset=UTF-8
* no chunk, no close, no size. Assume close to signal end

?xml version=1.0 encoding=UTF-8?
response
...removed the rest of the response body...

-- Aleksey

On 12-09-05 03:54 PM, Aleksey Vorona wrote:

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to
keep persistent HTTP connections:

$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep
Keep-Alive
Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it
would be better to ask about Solr example, since it is easier for anyone
to reproduce the issue.

-- Aleksey





Re: Problem with verifying signature ?

2012-09-05 Thread Chris Hostetter
: I download solr 4.0 beta and the .asc file. I use gpg4win and type this in
: the command line:
: 
: gpg --verify file.zip file.asc
: 
: I get a message like this:
: 
: *gpg: Can't check signature: No public key*

you can verify the asc sig file using the public KEYS file hosted on the 
main apache download site (do not trust asc or KEYS from a download 
mirror, that defeats the point)


https://www.apache.org/dist/lucene/solr/KEYS



-Hoss


deletedPkQuery not work in solr 3.3

2012-09-05 Thread jun Wang
I have a data-config.xml with 2 entity, like

entity name=full PK=ID ...
...
/entity

and

entity name=delta_build PK=ID ...
...
/entity

entity delta_build is for delta import, query is

?command=full-importentity=delta_buildclean=false

and I want to using deletedPkQuery to delete index. So I have add those to
entity delta_build

deltaQuery=select -1 as ID from dual

deltaImportQuery=select * from product where a.id='${dataimporter.delta.ID}' 

deletedPKQuery=select product_id as ID from modified_product where
gmt_create gt; to_date('${dataimporter.last_index_time}','-mm-dd
hh24:mi:ss') and modification = 'deleted'

deltaQuery and deltaImportQuery is simply to avoid delta import any
records, course delta import has been implement by full import. and I am
just want using delta for delete index.

But when I hit query

?command=delta-import

deltaQuery and deltaImportQuery can be found in log, and without
deletedPKQuery. Is there any thing wrong in config file?

-- 
from Jun Wang


Re: Problem with verifying signature ?

2012-09-05 Thread Kiran Jayakumar
Thank you Hoss. I imported the KEYS file using *gpg --import KEYS.txt*.
Then I did the *--verify* again. This time I get an output like this:

gpg: Signature made 08/06/12 19:52:21 Pacific Daylight Time using RSA key
ID 322
D7ECA
gpg: Good signature from Robert Muir (Code Signing Key) rm...@apache.org
*gpg: WARNING: This key is not certified with a trusted signature!*
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 6661 9BA3 C030 DD55 3625  1303 817A E1DD 322D 7ECA

Is this acceptable ?

Thanks

On Wed, Sep 5, 2012 at 5:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:

 : I download solr 4.0 beta and the .asc file. I use gpg4win and type this
 in
 : the command line:
 :
 : gpg --verify file.zip file.asc
 :
 : I get a message like this:
 :
 : *gpg: Can't check signature: No public key*

 you can verify the asc sig file using the public KEYS file hosted on the
 main apache download site (do not trust asc or KEYS from a download
 mirror, that defeats the point)


 https://www.apache.org/dist/lucene/solr/KEYS



 -Hoss



Re: Searching of Chinese characters and English

2012-09-05 Thread waynelam

Any thoughts?

It is weird, i can see the words are cutting correctly in Field 
Analysis. I checked almost every website that they are telling either 
CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the 
words are cutting then it should not be the problem of settings of 
different Analyzer. Am I correct?


Anyone have an idea or hints?

Thanks so much

Wayne



On 4/9/2012 13:03, waynelam wrote:

Hi all,

I tried to modified the schema.xml and solrconfig.xml come with Drupal 
search_api_solr modules. I tried to modified it so that it is 
suitable for an CJK environment. I can see Chinese words cut up each 2 
words in Field Analysis. If i use the following query


my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title 



I can see it returning results. The problem is when i change the 
search keywords for one of my field (e.g. t_title) to Chinese 
characters. It always shows


result name=response numFound=0 start=0/

in the results. It is strange because if a title contains both chinese 
and english (e.g. testing ??), when i search just the english part 
(e.g. fq=t_title:testing), i can find the result perfectly. It just 
happened to be problem when searching chinese characters.



Much appreciated if you guys can show me which part i did wrong.

Thanks

Wayne

*My Settings:*
Java : 1.6.0_24
Solr : version 3.6.1
tomcat: version 6.0.35

*My schema.xml* (i highlighted the place i changed from default)

*fieldType name=text class=solr.TextField indexed=true 
stored=true multiValued=true**
**  analyzer type=index 
class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
**tokenizer 
class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
**filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/**

**filter class=solr.LowerCaseFilterFactory/**
**filter class=solr.SnowballPorterFilterFactory 
language=English protected=protwords.txt/**

**filter class=solr.RemoveDuplicatesTokenFilterFactory/**
**filter class=schema.UnicodeNormalizationFilterFactory 
version=icu4j composed=false remove_diacritics=true 
remove_modifiers=true fold=true/**

**filter class=solr.ISOLatin1AccentFilterFactory/**
**  /analyzer**
**  analyzer type=query 
class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
**tokenizer 
class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
**filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/**

**filter class=solr.LowerCaseFilterFactory/**
**filter class=solr.SnowballPorterFilterFactory 
language=English protected=protwords.txt/**

**filter class=solr.RemoveDuplicatesTokenFilterFactory/**
**filter class=schema.UnicodeNormalizationFilterFactory 
version=icu4j composed=false remove_diacritics=true 
remove_modifiers=true fold=true/**

**filter class=solr.ISOLatin1AccentFilterFactory/**
**  /analyzer**
**/fieldType*

fieldType name=sortString class=solr.TextField indexed=true 
stored=true sortMissingLast=true omitNorms=true

  analyzer

tokenizer class=solr.KeywordTokenizerFactory/

filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
/fieldType

fieldType name=rand class=solr.RandomSortField indexed=true /

fieldtype name=ignored stored=true indexed=false 
class=solr.StrField /

 /types
 fields

   field name=id   type=string indexed=true stored=true 
required=true /
   field name=item_id  type=string indexed=true stored=true 
required=true /
   field name=index_id type=string indexed=true stored=true 
required=true /


   copyField source=item_id dest=ss_search_api_id /
   field name=spell type=textSpell indexed=true stored=true 
multiValued=true/

   copyField source=t_* dest=spell/

*field name=t_title type=text indexed=true stored=true 
autoGeneratePhraseQueries=false/*

   dynamicField name=t_* type=text termVectors=true /
   dynamicField name=ss_* type=sortString multiValued=false 
termVectors=true /
   dynamicField name=sm_* type=sortString multiValued=true 
termVectors=true /
   dynamicField name=is_* type=tlong multiValued=false 
termVectors=true /
   dynamicField name=im_* type=long multiValued=true 
termVectors=true /
   dynamicField name=fs_* type=tdouble multiValued=false 
termVectors=true /
   dynamicField name=fm_* type=tdouble multiValued=true 
termVectors=true /
   dynamicField name=ds_* type=tdate multiValued=false 
termVectors=true /
   dynamicField name=dm_* type=tdate multiValued=true 
termVectors=true /
   dynamicField name=bs_* type=boolean multiValued=false 
termVectors=true /
   dynamicField name=bm_* type=boolean multiValued=true 
termVectors=true /
   dynamicField name=f_ss_* type=string multiValued=false 

Re: Document Processing

2012-09-05 Thread Lance Norskog
There is another way to do this: crawl the mobile site! 

The Fennec browser from Mozilla talks Android. I often use it to get pagecrap 
off my screen.

- Original Message -
| From: Lance Norskog goks...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Wednesday, August 29, 2012 7:37:37 PM
| Subject: Re: Document Processing
| 
| I've seen the JSoup HTML parser library used for this. It worked
| really well. The Boilerpipe library may be what you want. Its
| schwerpunkt (*) is to separate boilerplate from wanted text in an
| HTML
| page. I don't know what fine-grained control it has.
| 
| * raison d'être. There is no English word for this concept.
| 
| On Tue, Dec 6, 2011 at 1:39 PM, Tommaso Teofili
| tommaso.teof...@gmail.com wrote:
|  Hello Michael,
| 
|  I can help you with using the UIMA UpdateRequestProcessor [1]; the
|  current
|  implementation uses in-memory execution of UIMA pipelines but since
|  I was
|  planning to add the support for higher scalability (with UIMA-AS
|  [2]) that
|  may help you as well.
| 
|  Tommaso
| 
|  [1] :
|  
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java
|  [2] : http://uima.apache.org/doc-uimaas-what.html
| 
|  2011/12/5 Michael Kelleher mj.kelle...@gmail.com
| 
|  Hello Erik,
| 
|  I will take a look at both:
| 
|  org.apache.solr.update.**processor.**LangDetectLanguageIdentifierUp**
|  dateProcessor
| 
|  and
| 
|  org.apache.solr.update.**processor.**TikaLanguageIdentifierUpdatePr**
|  ocessor
| 
| 
|  and figure out what I need to extend to handle processing in the
|  way I am
|  looking for.  I am assuming that component configuration is
|  handled in a
|  standard way such that I can configure my new UpdateProcessor in
|  the same
|  way I would configure any other UpdateProcessor component?
| 
|  Thanks for the suggestion.
| 
| 
|  1 more question:  given that I am probably going to convert the
|  HTML to
|  XML so I can use XPath expressions to extract my content, do you
|  think
|  that this kind of processing will overload Solr?  This Solr
|  instance will
|  be used solely for indexing, and will only ever have a single
|  ManifoldCF
|  crawling job feeding it documents at one time.
| 
|  --mike
| 
| 
| 
| 
| --
| Lance Norskog
| goks...@gmail.com
| 


Re: Searching of Chinese characters and English

2012-09-05 Thread Lance Norskog
I believe that you should remove the Analyzer class name from the field type. I 
think it overrides the stacks of tokenizer/tokenfilter. Other fieldType 
declarations do not have an Analyzer class and Tokenizers.
 analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer
should be:
 analyzer type=index

This may not help with your searching problem.

- Original Message -
| From: waynelam wayne...@ln.edu.hk
| To: solr-user@lucene.apache.org
| Sent: Wednesday, September 5, 2012 8:07:36 PM
| Subject: Re: Searching of Chinese characters and English
| 
| Any thoughts?
| 
| It is weird, i can see the words are cutting correctly in Field
| Analysis. I checked almost every website that they are telling either
| CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the
| words are cutting then it should not be the problem of settings of
| different Analyzer. Am I correct?
| 
| Anyone have an idea or hints?
| 
| Thanks so much
| 
| Wayne
| 
| 
| 
| On 4/9/2012 13:03, waynelam wrote:
|  Hi all,
| 
|  I tried to modified the schema.xml and solrconfig.xml come with
|  Drupal
|  search_api_solr modules. I tried to modified it so that it is
|  suitable for an CJK environment. I can see Chinese words cut up
|  each 2
|  words in Field Analysis. If i use the following query
| 
|  
my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title
| 
| 
|  I can see it returning results. The problem is when i change the
|  search keywords for one of my field (e.g. t_title) to Chinese
|  characters. It always shows
| 
|  result name=response numFound=0 start=0/
| 
|  in the results. It is strange because if a title contains both
|  chinese
|  and english (e.g. testing ??), when i search just the english part
|  (e.g. fq=t_title:testing), i can find the result perfectly. It
|  just
|  happened to be problem when searching chinese characters.
| 
| 
|  Much appreciated if you guys can show me which part i did wrong.
| 
|  Thanks
| 
|  Wayne
| 
|  *My Settings:*
|  Java : 1.6.0_24
|  Solr : version 3.6.1
|  tomcat: version 6.0.35
| 
|  *My schema.xml* (i highlighted the place i changed from default)
| 
|  *fieldType name=text class=solr.TextField indexed=true
|  stored=true multiValued=true**
|  **  analyzer type=index
|  class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
|  **tokenizer
|  class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
|  **filter class=solr.WordDelimiterFilterFactory
|  generateWordParts=1 generateNumberParts=1 catenateWords=1
|  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/**
|  **filter class=solr.LowerCaseFilterFactory/**
|  **filter class=solr.SnowballPorterFilterFactory
|  language=English protected=protwords.txt/**
|  **filter
|  class=solr.RemoveDuplicatesTokenFilterFactory/**
|  **filter class=schema.UnicodeNormalizationFilterFactory
|  version=icu4j composed=false remove_diacritics=true
|  remove_modifiers=true fold=true/**
|  **filter class=solr.ISOLatin1AccentFilterFactory/**
|  **  /analyzer**
|  **  analyzer type=query
|  class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
|  **tokenizer
|  class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
|  **filter class=solr.WordDelimiterFilterFactory
|  generateWordParts=1 generateNumberParts=1 catenateWords=0
|  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/**
|  **filter class=solr.LowerCaseFilterFactory/**
|  **filter class=solr.SnowballPorterFilterFactory
|  language=English protected=protwords.txt/**
|  **filter
|  class=solr.RemoveDuplicatesTokenFilterFactory/**
|  **filter class=schema.UnicodeNormalizationFilterFactory
|  version=icu4j composed=false remove_diacritics=true
|  remove_modifiers=true fold=true/**
|  **filter class=solr.ISOLatin1AccentFilterFactory/**
|  **  /analyzer**
|  **/fieldType*
| 
|  fieldType name=sortString class=solr.TextField
|  indexed=true
|  stored=true sortMissingLast=true omitNorms=true
|analyzer
| 
|  tokenizer class=solr.KeywordTokenizerFactory/
| 
|  filter class=solr.LowerCaseFilterFactory /
|  filter class=solr.TrimFilterFactory /
|/analyzer
|  /fieldType
| 
|  fieldType name=rand class=solr.RandomSortField
|  indexed=true /
| 
|  fieldtype name=ignored stored=true indexed=false
|  class=solr.StrField /
|   /types
|   fields
| 
| field name=id   type=string indexed=true
| stored=true
|  required=true /
| field name=item_id  type=string indexed=true
| stored=true
|  required=true /
| field name=index_id type=string indexed=true
| stored=true
|  required=true /
| 
| copyField source=item_id dest=ss_search_api_id /
| field name=spell type=textSpell indexed=true
| stored=true
|  multiValued=true/
| copyField source=t_* dest=spell/
| 
|  *field name=t_title type=text indexed=true stored=true
|  

Re: Searching of Chinese characters and English

2012-09-05 Thread waynelam

Thank you Lance.
I just found out the problem, in case somebody came across this.
It turn out to be the problem that tomcat is not accepting UTF-8 in URL 
by default.


http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

I have no idea why it is the case but after i follow the instruction in 
the document above.


Problem solved!!

Thanks so much for your help!


Wayne


On 6/9/2012 11:19, Lance Norskog wrote:

I believe that you should remove the Analyzer class name from the field type. I think 
it overrides the stacks of tokenizer/tokenfilter. Other fieldType 
declarations do not have an Analyzer class and Tokenizers.
  analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer
should be:
  analyzer type=index

This may not help with your searching problem.

- Original Message -
| From: waynelam wayne...@ln.edu.hk
| To: solr-user@lucene.apache.org
| Sent: Wednesday, September 5, 2012 8:07:36 PM
| Subject: Re: Searching of Chinese characters and English
|
| Any thoughts?
|
| It is weird, i can see the words are cutting correctly in Field
| Analysis. I checked almost every website that they are telling either
| CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the
| words are cutting then it should not be the problem of settings of
| different Analyzer. Am I correct?
|
| Anyone have an idea or hints?
|
| Thanks so much
|
| Wayne
|
|
|
| On 4/9/2012 13:03, waynelam wrote:
|  Hi all,
| 
|  I tried to modified the schema.xml and solrconfig.xml come with
|  Drupal
|  search_api_solr modules. I tried to modified it so that it is
|  suitable for an CJK environment. I can see Chinese words cut up
|  each 2
|  words in Field Analysis. If i use the following query
| 
|  
my_ip_address:8080/solr/select?indent=onversion=2.2fq=t_title:Findstart=0rows=10fl=t_title
| 
| 
|  I can see it returning results. The problem is when i change the
|  search keywords for one of my field (e.g. t_title) to Chinese
|  characters. It always shows
| 
|  result name=response numFound=0 start=0/
| 
|  in the results. It is strange because if a title contains both
|  chinese
|  and english (e.g. testing ??), when i search just the english part
|  (e.g. fq=t_title:testing), i can find the result perfectly. It
|  just
|  happened to be problem when searching chinese characters.
| 
| 
|  Much appreciated if you guys can show me which part i did wrong.
| 
|  Thanks
| 
|  Wayne
| 
|  *My Settings:*
|  Java : 1.6.0_24
|  Solr : version 3.6.1
|  tomcat: version 6.0.35
| 
|  *My schema.xml* (i highlighted the place i changed from default)
| 
|  *fieldType name=text class=solr.TextField indexed=true
|  stored=true multiValued=true**
|  **  analyzer type=index
|  class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
|  **tokenizer
|  class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
|  **filter class=solr.WordDelimiterFilterFactory
|  generateWordParts=1 generateNumberParts=1 catenateWords=1
|  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/**
|  **filter class=solr.LowerCaseFilterFactory/**
|  **filter class=solr.SnowballPorterFilterFactory
|  language=English protected=protwords.txt/**
|  **filter
|  class=solr.RemoveDuplicatesTokenFilterFactory/**
|  **filter class=schema.UnicodeNormalizationFilterFactory
|  version=icu4j composed=false remove_diacritics=true
|  remove_modifiers=true fold=true/**
|  **filter class=solr.ISOLatin1AccentFilterFactory/**
|  **  /analyzer**
|  **  analyzer type=query
|  class=org.apache.lucene.analysis.cjk.CJKAnalyzer**
|  **tokenizer
|  class=org.apache.lucene.analysis.cjk.CJKTokenizer/**
|  **filter class=solr.WordDelimiterFilterFactory
|  generateWordParts=1 generateNumberParts=1 catenateWords=0
|  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/**
|  **filter class=solr.LowerCaseFilterFactory/**
|  **filter class=solr.SnowballPorterFilterFactory
|  language=English protected=protwords.txt/**
|  **filter
|  class=solr.RemoveDuplicatesTokenFilterFactory/**
|  **filter class=schema.UnicodeNormalizationFilterFactory
|  version=icu4j composed=false remove_diacritics=true
|  remove_modifiers=true fold=true/**
|  **filter class=solr.ISOLatin1AccentFilterFactory/**
|  **  /analyzer**
|  **/fieldType*
| 
|  fieldType name=sortString class=solr.TextField
|  indexed=true
|  stored=true sortMissingLast=true omitNorms=true
|analyzer
| 
|  tokenizer class=solr.KeywordTokenizerFactory/
| 
|  filter class=solr.LowerCaseFilterFactory /
|  filter class=solr.TrimFilterFactory /
|/analyzer
|  /fieldType
| 
|  fieldType name=rand class=solr.RandomSortField
|  indexed=true /
| 
|  fieldtype name=ignored stored=true indexed=false
|  class=solr.StrField /
|   /types
|   fields
| 
| field name=id   type=string indexed=true
| stored=true
|  required=true /
| field name=item_id