Re: what happends with slave during repliacation?

2012-09-24 Thread Bernd Fehling
Hi Amanda,
we don't use solr cloud jet, just 3 dedicated server.
When it comes to distribution the choice will be either solr cloud or elastic 
search.
But currently we use unix shell scripts with ssh for switching.
Easy, simple, stable :-)

Regards,
Bernd


Am 21.09.2012 16:03, schrieb yangqian_nj:
 Hi Bernd,
 
 You mentioned: Only one slave is online the other is for backup. The backup
 gets replicated first.
 After that the servers will be switched and the online becomes backup. 
 
 Do you please let us know how to do you do the Switch? We use SWAP to switch
 in solr cloud. After SWAP, when we query, from the tomcat log, we could see
 the query actually go to both cores for some reason.
 
 Thanks,
 Amanda
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/what-happends-with-slave-during-repliacation-tp4009100p4009417.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Return only matched multiValued field

2012-09-24 Thread Mikhail Khludnev
Hi
It seems like highlighting feature.
24.09.2012 0:51 пользователь Dotan Cohen dotanco...@gmail.com написал:

 Assuming a multivalued, stored and indexed field with name comment.
 When performing a search, I would like to return only the values of
 comment which contain the match. For example:

 When searching for gold instead of getting this result:

 doc
 arr name=comment
 strTheres a lady whos sure/str
 strall that glitters is gold/str
 strand shes buying a stairway to heaven/str
 /arr
 /doc

 I would prefer to get this result:

 doc
 arr name=comment
 strall that glitters is gold/str
 /arr
 /doc

 (psuedo-XML from memory, may not be accurate but illustrates the point)

 Is there any way to do this with a Solr 4 index? The client accessing
 Solr is on a dial-up connection (no provision for DSL or other high
 speed internet) so I'd like to move as little data over the wire as
 possible. In reality, the array will have tens of fields so returning
 only the relevant fields may reduce the data transferred by an order
 of magnitude.

 Thanks.

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com



Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Daniel Brügge
Hi,

I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
so  that it wasn't reachable. CPU load was 100%.

After a restart i couldn't access the data it just telled me:

no servers hosting shard

Is there a way to get the data back?

Thanks  regards

Daniel


Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Sami Siren
hi,

Can you share a little bit more about your configuration: how many
shards, # of replicas, how does your clusterstate.json look like,
anything suspicious in the logs?

--
 Sami Siren

On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge
daniel.brue...@gmail.com wrote:
 Hi,

 I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
 so  that it wasn't reachable. CPU load was 100%.

 After a restart i couldn't access the data it just telled me:

 no servers hosting shard

 Is there a way to get the data back?

 Thanks  regards

 Daniel


Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Hi;

I am working with apache-solr-3.6.0 on windows machine. I would like to
remove all punctuation marks before indexing except the colon and the
full-stop.

I tried:

fieldType name=text_ar class=solr.TextField positionIncrementGap=100
  analyzer 
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=[\p{Punct}[^\.^\:]] replacement= replace=all/
  /analyzer
/fieldType
But it didn't work. Any Ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrcloud without realtime

2012-09-24 Thread Radim Kolar
its possible to use solrcloud but without real-time features? In my 
application I do not need realtime features and old style processing 
should be more efficient.


RE: Nodes cannot recover and become unavailable

2012-09-24 Thread Markus Jelsma
It seems my clusterstate.json is still old.  Is there a method to recreate is 
without taking all nodes down at the same time?

 
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Thu 20-Sep-2012 10:14
 To: solr-user@lucene.apache.org
 Subject: RE: Nodes cannot recover and become unavailable
 
 Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll 
 check the removal of the LOG line.
 
 thanks
  
 -Original message-
  From:Sami Siren ssi...@gmail.com
  Sent: Wed 19-Sep-2012 17:45
  To: solr-user@lucene.apache.org
  Subject: Re: Nodes cannot recover and become unavailable
  
  also, did you re create the cluster after upgrading to a newer
  version? I believe there were some changes made to the
  clusterstate.json recently that are not backwards compatible.
  
  --
   Sami Siren
  
  
  
  On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote:
   Hi,
  
   I am having troubles understanding the reason for that NPE.
  
   First you could try removing the line #102 in HttpClientUtility so
   that logging does not prevent creation of the http client in
   SyncStrategy.
  
   --
Sami Siren
  
   On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
   markus.jel...@openindex.io wrote:
   Hi,
  
   Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
   back online. When i restart one node the slices on the other nodes are 
   throwing exceptions and cannot be queried. I'm not sure how to remedy 
   the problem but stopping a node or restarting it a few times seems to 
   help it. The problem is when i restart a node, and it happens, i must 
   not restart another node because that may trigger other slices becoming 
   unavailable.
  
   Here are some parts of the log:
  
   2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - trying again... core=oi_i
   2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
   [main-EventThread] - : Stopping recovery for 
   zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
   2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] 
   - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Error while trying to recover. 
   core=oi_i:org.apache.solr.common.SolrException: We are not the leader
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
   at 
   org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
   at 
   org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
   at 
   org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
  
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - trying again... core=oi_i
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
   2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Recovery failed - I give up. core=oi_i
   2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
   [RecoveryThread] - : Stopping recovery for 
   zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
   error: java.lang.NullPointerException
ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
   http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
   recover:java.lang.NullPointerException
   at 
   org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
   org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155)
   at 
   org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128)
   at 
   org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
   at 
   org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
   at 
   org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
   at 
   org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
   at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
   at 
   org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
   at 
   org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
   at 
   org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
   at 
   org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
   at 
   

Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
 field name=doctest type=textmulti stored=true 
 indexed=true
 multiValued=true /
 /fields
 defaultSearchFielddoctest/defaultSearchField

Note that in anonymizing the information, I introduced a typo. The
above doctest should be doctext. In any case, the field names in
the production application and in production schema do in fact match!


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Items disappearing from Solr index

2012-09-24 Thread Kissue Kissue
Hi,

I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to
index and delete items from solr.

I basically index items from the db into solr every night. Existing items
can be marked for deletion in the db and a delete request sent to solr to
delete such items.

My process runs as follows every night:

1. Check if items have been marked for deletion and delete from solr. I
commit and optimize after the entire solr deletion runs.
2. Index any new items to solr. I commit and optimize after all the new
items have been added.

Recently i started noticing that huge chunks of items that have not been
marked for deletion are disappearing from the index. I checked the solr
logs and the logs indicate that it is deleting exactly the number of items
requested but still a lot of other items disappear from the index from time
to time. Any ideas what might be causing this or what i am doing wrong.


Thanks.


Understanding autoSoftCommit

2012-09-24 Thread Trym R. Møller

Hi

On my windows workstation I have tried to index a document into a 
SolrCloud instance with the following special configuration:

autoCommit
maxTime120/maxTime
/autoCommit
autoSoftCommit
maxTime60/maxTime
/autoSoftCommit
...
updateLog
  str name=dir${solr.data.dir:}/str
/updateLog
That is commit every 20 minutes and soft commit every 10 minutes.

Right after indexing I can find the document using /get (and not using 
/search) and after 10 minutes I can find it as well using /search.
If I stop Solr using Ctrl+C or kill -9 (from my cygwin console) before 
the 10 minutes have passed and starts Solr again then I can find the 
document using both /get and /search.


Are there any scenarios where I will loose an indexed document before 
either commit or soft commit is trigged?

And does the transaction log have anything to do with this...

Thanks in advance.

Best regards Trym


Range operator problems in Chef ( automating framework)

2012-09-24 Thread Christian Bordis
Hi Everyone!

We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home). 
 It uses solr for search but range queries don't work as expected. Maybe chef, 
solr just buggy or I am doing it wrong ;-)

In chef I have bunch of nodes witch timestamp attribute. Now want search nodes 
with have timestamp not older than on hour:

search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *])

Is this string in the call a solr compliant range expression at all? Unluckily, 
 I have no toys at hand to verify this myself at the moment... but I work on 
this.

Thanks for reading! ^^

Kind Regards,

Christian Bordis


Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Hi,

On migrating from 1.3 to 3.6.1  , I see the query performance degrading by
nearly 2 times for all types of query.  Indexing performance slight
degradation over 1.3 For Indexing we use our custom scripts that post xml
over HTTP.

Any thing that I might have missed . I am thinking that this might be due
to new Tiered MP over LogByteSize creating more segment files and hence
more the Query latency .We are using Compound Files in 1.3  and I have set
this to true even in 3.6.1 ,but results in more segement files

On optimizing the query response time improved beyond 1.3  .So could it be
the MP or am i missing something here . Do let me know

Please find attached the solrconfig.xml

Regards
Sujatha
?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--


config

abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

  luceneMatchVersionLUCENE_36/luceneMatchVersion
  
  !--  The DirectoryFactory to use for indexes.
solr.StandardDirectoryFactory, the default, is filesystem based.
solr.RAMDirectoryFactory is memory based, not persistent, and doesn't work with replication. --
  directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

indexConfig
 !-- Values here affect all index writers and act as a default unless overridden. --

   useCompoundFiletrue/useCompoundFile  
   mergeFactor4/mergeFactor
  
 maxFieldLength1/maxFieldLength 
 writeLockTimeout1000/writeLockTimeout
  
 !-- mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy /  --
 
   !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce4/int
  int name=segmentsPerTier4/int
/mergePolicy --
  



 
  lockTypesingle/lockType
/indexConfig

 jmx /
  updateHandler class=solr.DirectUpdateHandler2 /
  
   query
  
maxBooleanClauses10/maxBooleanClauses

filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024 /
  
  queryResultCache
  class=solr.LRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/
  
   documentCache
  class=solr.LRUCache
  size=16384
  initialSize=4096
  autowarmCount=0/
  
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   useFilterForSortedQuerytrue/useFilterForSortedQuery
   queryResultWindowSize50/queryResultWindowSize
   queryResultMaxDocsCached200/queryResultMaxDocsCached
 
   
 

useColdSearcherfalse/useColdSearcher
maxWarmingSearchers2/maxWarmingSearchers
  /query   

  requestDispatcher handleSelect=true 
requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 /
httpCaching never304=true /
  !--   httpCaching lastModifiedFrom=openTime
 etagSeed=Solr
/httpCaching --
  /requestDispatcher
  
  
  
   requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !-- 
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
--
 /lst
 
  /requestHandler
  
 
  
   requestHandler name=/update class=solr.XmlUpdateRequestHandler /
   !-- requestHandler name=/analysis class=solr.AnalysisRequestHandler / --
  requestHandler name=/admin/ class=solr.admin.AdminHandlers /
   requestHandler name=/analysis/field 
  startup=lazy
  class=solr.FieldAnalysisRequestHandler /
  
   requestHandler name=/analysis/document 
  class=solr.DocumentAnalysisRequestHandler 
  startup=lazy /
   
  
  requestHandler name=/admin/ping class=solr.PingRequestHandler
lst name=invariants
  str name=qsolrpingquery/str
/lst
lst name=defaults
  str name=echoParamsall/str
/lst
  /requestHandler
  
  !-- Echo the request contents back to the client --
  requestHandler name=/debug/dump class=solr.DumpRequestHandler 
lst name=defaults
 str name=echoParamsexplicit/str !-- for all params (including the default etc) use: 'all' --
 str 

Re: Help with new Join Functionallity in Solr 4.0

2012-09-24 Thread Erick Erickson
NP, good luck!

On Sun, Sep 23, 2012 at 3:41 PM,  milen.ti...@materna.de wrote:
 Hello Erick,

 Thanks a lot for your reply! Your suggestion is actually exactly the 
 alternative solution we are thinking about and with your clarification on 
 Solr's performance we are going to go for it! Many thanks again!

 Milen

 
 Von: Erick Erickson [erickerick...@gmail.com]
 Gesendet: Sonntag, 23. September 2012 17:50
 An: solr-user@lucene.apache.org
 Betreff: Re: Help with new Join Functionallity in Solr 4.0

 The very first thing to try is flatten your data so you don't have to use 
 joins.
 I know that goes against your database instincts, but Solr easily handles
 millions and millions of documents. So if the cross-product of docs and 
 modules
 isn't prohibitive, that's what I'd do first. Then it's just a matter of
 forming a search without joins

 Joins run into performance issues when the join field has many unique
 values, unfortunately the field people often want to join on is something
 like a uniqueKey (or PK in RDBMS terms), so be aware of that.

 Best
 Erick

 On Fri, Sep 21, 2012 at 5:46 AM,  milen.ti...@materna.de wrote:
 Dear Solr community,

 I am rather new to Solr, however I already find it kind of attractive. We 
 are developing a research application, which contains a Solr index with 
 three different kinds of documents, here the basic idea:


 -  A document of type doc consisting of fields id, docid, doctitle 
 and some other metadata

 -  A document of type module consisting of fields id, modid and 
 text

 -  A document of type docmodule consisting of fields id, docrefid, 
 modrefid and some metadata about the relation between a document and a 
 module; filed docrefid refers to the id of a doc document, while field 
 modrefid contains the id of a module document

 In other words, in our model there are documents (type doc) consisting of 
 several modules and there is some characterization of each link between a 
 document and a module.

 Almost all fields of a doc document are searchable, as well as the text of 
 a module and the metadata of the docmodule entries.

 We are looking for a fast way to retrieve all modules containing a certain 
 text and associated with a given document, preferably with a single query. 
 This means we want to query the text from a module document while we set a 
 restriction on the docrefid from a docmodule or the id from a doc 
 document. Is this possible by means of the new pseudo joins? Any ideas are 
 highly appreciated!

 Thanks in advance!

 Milen Tilev
 Master of Science
 Softwareentwickler
 Business Unit Information
 

 MATERNA GmbH
 Information  Communications

 Voßkuhle 37
 44141 Dortmund
 Deutschland

 Telefon: +49 231 5599-8257
 Fax: +49 231 5599-98257
 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de

 www.materna.dehttp://www.materna.de/ | 
 Newsletterhttp://www.materna.de/newsletter | 
 Twitterhttp://twitter.com/MATERNA_GmbH | 
 XINGhttp://www.xing.com/companies/MATERNAGMBH | 
 Facebookhttp://www.facebook.com/maternagmbh
 

 Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
 Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
 Amtsgericht Dortmund HRB 5839



Re: solrcloud without realtime

2012-09-24 Thread Erick Erickson
I'm pretty sure all you need to do is disable autoSoftCommit. Or rather
don't un-comment it from solrconfig.xml

Best
Erick

On Mon, Sep 24, 2012 at 5:44 AM, Radim Kolar h...@filez.com wrote:
 its possible to use solrcloud but without real-time features? In my
 application I do not need realtime features and old style processing should
 be more efficient.


Re: Return only matched multiValued field

2012-09-24 Thread Erick Erickson
Hmmm, works for me. What is your entire response packet?

And you've covered the bases with indexed and stored so this
seems like it _should_ work.

Best
Erick

On Mon, Sep 24, 2012 at 6:12 AM, Dotan Cohen dotanco...@gmail.com wrote:
 field name=doctest type=textmulti stored=true 
 indexed=true
 multiValued=true /
 /fields
 defaultSearchFielddoctest/defaultSearchField

 Note that in anonymizing the information, I introduced a typo. The
 above doctest should be doctext. In any case, the field names in
 the production application and in production schema do in fact match!


 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com


Re: Items disappearing from Solr index

2012-09-24 Thread Erick Erickson
How do you delete items? By ID or by query?

My guess is that one of two things is happening:
1 your delete process is deleting too much data.
2 your index process isn't indexing what you think.

I'd add some logging to the SolrJ program to see what
it thinks is has deleted or added to the index and go from there.

Best
Erick

On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote:
 Hi,

 I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to
 index and delete items from solr.

 I basically index items from the db into solr every night. Existing items
 can be marked for deletion in the db and a delete request sent to solr to
 delete such items.

 My process runs as follows every night:

 1. Check if items have been marked for deletion and delete from solr. I
 commit and optimize after the entire solr deletion runs.
 2. Index any new items to solr. I commit and optimize after all the new
 items have been added.

 Recently i started noticing that huge chunks of items that have not been
 marked for deletion are disappearing from the index. I checked the solr
 logs and the logs indicate that it is deleting exactly the number of items
 requested but still a lot of other items disappear from the index from time
 to time. Any ideas what might be causing this or what i am doing wrong.


 Thanks.


Solr is not Indexing after Mysql Upgradation

2012-09-24 Thread Rahul Paul
Indexing is not happening after 'x' documents.

I am using Bitnami and had  upgraded Mysql server from Mysql 5.1.* to Mysql
5.5.* version. After up gradation when I ran indexing on solr,  it not get
indexed. 

I am using a procedure in which i am finding the parent of a child and
inserting it in a table which uses MyISAM as memory. Individually when i ran
the procedure in Mysql 5.1.* and Mysql 5.5.* it works fine both the cases.

I am calling a procedure in solr after that I am executing some sql
statements from the table I have created in the above procedure. When I ran
both procedure  in query together the data is not getting indexed but if I
ran procedure separately in solr with out executing the queries it works
fine, and if i comment the procedure and ran the queries it also works fine
but when I ran both together, the data didn't get index.
Can anyone recommend some solution to this?  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-not-Indexing-after-Mysql-Upgradation-tp4009790.html
Sent from the Solr - User mailing list archive at Nabble.com.


Splitting up a location to make it searchable

2012-09-24 Thread Spadez
I am using Google for location input. 

*It often splits out something like this:*
Shorewood, Seattle, Wa

*Since I am using this index analyzer:*
filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30
/

It means that if I search for Sho or Shorew I get the result I want.
However, if I search for “Sea” or “Seatt” I get no results. 

I guess I need to break the location down into “Shorewood” “Seattle” “Wa”
instead of “Shorewood, Seattle, Wa”

Can this be done easily and efficiently within Solr, perhaps as an index
analyzer?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Splitting-up-a-location-to-make-it-searchable-tp4009825.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Steven A Rowe
Hi Daisy,

I can't see anything wrong with the regex or the XML syntax.

One possibility: if it's Arabic you're matching against, you may want to add 
ARABIC FULL STOP U+06D4 to the set you subtract from \p{Punct}.

If you give an example of your input and your expected output, I might be able 
to help more.

Steve

-Original Message-
From: Daisy [mailto:omnia.za...@gmail.com] 
Sent: Monday, September 24, 2012 5:08 AM
To: solr-user@lucene.apache.org
Subject: Solr - Remove specific punctuation marks

Hi;

I am working with apache-solr-3.6.0 on windows machine. I would like to
remove all punctuation marks before indexing except the colon and the
full-stop.

I tried:

fieldType name=text_ar class=solr.TextField positionIncrementGap=100
  analyzer 
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=[\p{Punct}[^\.^\:]] replacement= replace=all/
  /analyzer
/fieldType
But it didn't work. Any Ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Yes I am trying to index Arabic document. There is a problem that the 
regex couldn't be understood in the solr schema and it gives 500 - code
error.
Here is an example:

input:

هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي.

I tried also the regex:  pattern=([\(\)\}\{\,[^.:\s+\S+]])
but I failed to remove the bracutes from the text above, when i searched for
a bracket I found result.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009830.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Markus Jelsma


 
 
-Original message-
 From:Daisy omnia.za...@gmail.com
 Sent: Mon 24-Sep-2012 15:09
 To: solr-user@lucene.apache.org
 Subject: RE: Solr - Remove specific punctuation marks
 
 Yes I am trying to index Arabic document. There is a problem that the 
 regex couldn't be understood in the solr schema and it gives 500 - code
 error.

The config is XML. Try encoding the ampersand as amp;

 Here is an example:
 
 input:
 
 هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي.
 
 I tried also the regex:  pattern=([\(\)\}\{\,[^.:\s+\S+]])
 but I failed to remove the bracutes from the text above, when i searched for
 a bracket I found result.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009830.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
On Mon, Sep 24, 2012 at 2:16 PM, Erick Erickson erickerick...@gmail.com wrote:
 Hmmm, works for me. What is your entire response packet?

 And you've covered the bases with indexed and stored so this
 seems like it _should_ work.


I'm sorry, reducing the output to rows=1 helped me notice that the
highlighted sections come after the main results. The highlighting
feature works as expected.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: solrcloud without realtime

2012-09-24 Thread Radim Kolar

Dne 24.9.2012 14:05, Erick Erickson napsal(a):

I'm pretty sure all you need to do is disable autoSoftCommit. Or rather
don't un-comment it from solrconfig.xml
and what about solr.NRTCachingDirectoryFactory? Is 
solr.MMapDirectoryFactory faster if there is no NRT search requirements?


RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
I tried amp; and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,

str name=rawquerystring{/str
str name=querystring{/str
str name=parsedquerytext:/str
str name=parsedquery_toStringtext:/str

 but still the numfound gives 1 

result name=response numFound=1 start=0

and the highlight shows the result of punctuation mark
 em{/em
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
On Mon, Sep 24, 2012 at 9:47 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Hi
 It seems like highlighting feature.

Thank you Mikhail. I actually do need the entire matched single entry,
not a snippet of it. Looking at the example in the OP, with
highlighting on gold I would get

emglitters is gold/em

Whereas I need:
strall that glitters is gold/str

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Range operator problems in Chef ( automating framework)

2012-09-24 Thread Jack Krupansky
That looks like a valid Solr date math expression, but you need to make sure 
that the field type is actually a Solr DateField as opposed to simply an 
integer Unix time value.


-- Jack Krupansky

-Original Message- 
From: Christian Bordis

Sent: Monday, September 24, 2012 7:16 AM
To: solr-user@lucene.apache.org
Subject: Range operator problems in Chef ( automating framework)

Hi Everyone!

We doing some nice stuff with Chef 
(http://wiki.opscode.com/display/chef/Home).  It uses solr for search but 
range queries don't work as expected. Maybe chef, solr just buggy or I am 
doing it wrong ;-)


In chef I have bunch of nodes witch timestamp attribute. Now want search 
nodes with have timestamp not older than on hour:


search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *])

Is this string in the call a solr compliant range expression at all? 
Unluckily,  I have no toys at hand to verify this myself at the moment... 
but I work on this.


Thanks for reading! ^^

Kind Regards,

Christian Bordis 



Re: solrcloud without realtime

2012-09-24 Thread Mark Miller
On Mon, Sep 24, 2012 at 9:21 AM, Radim Kolar h...@filez.com wrote:

 and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory
 faster if there is no NRT search requirements?

NRTCachingDirectoryFactory is a wrapping directory - it's generally
going to use solr.MMapDirectoryFactory as it's delegate anyhow.

It should not hurt performance if you are not using NRT though.

-- 
- Mark


Re: Qtime Vs DebugComponent Timing

2012-09-24 Thread Jack Krupansky
And QTime doesn't include the time spent in the container (e.g., Tomcat or 
Jetty) or network latency. Usually a query benchmark would be from the time 
the client sent the query request until the time the client received the 
query results.


The debug timing will help you understand which Solr components are 
consuming the time. For example, the highlighter, but that is still part of 
overal query processing time.


-- Jack Krupansky

-Original Message- 
From: Sujatha Arun

Sent: Monday, September 24, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: Qtime Vs DebugComponent Timing

Whats the difference between the QTime that gets  returned in the xml
results vs the debugComponet  Qparser timing break up and which one should
be considered for benchmarking performance of solr?

I understand that Qtime is total time take by solr  to execute a search in
ms . But Qparser breakup in the debugComponent does not exactly reflect
this  ...So which one should be used for benchmark purpose?

Regards 



Re: Understanding autoSoftCommit

2012-09-24 Thread Mark Miller
autoCommit (hard commit) is basically just to reduce how much RAM is
needed for the transaction log. You should generally use it with
openSearcher=false and don't need to use it for visibility.

It's also not required for durability due to the transaction log.

Soft commit should be used for visibility. It's also got nothing to do
with durability.

For durability, the idea is that if Solr accepts your update, it's in.
And yes, the transaction log is part of that.

- Mark

On Mon, Sep 24, 2012 at 7:12 AM, Trym R. Møller t...@sigmat.dk wrote:
 Hi

 On my windows workstation I have tried to index a document into a SolrCloud
 instance with the following special configuration:
 autoCommit
 maxTime120/maxTime
 /autoCommit
 autoSoftCommit
 maxTime60/maxTime
 /autoSoftCommit
 ...
 updateLog
   str name=dir${solr.data.dir:}/str
 /updateLog
 That is commit every 20 minutes and soft commit every 10 minutes.

 Right after indexing I can find the document using /get (and not using
 /search) and after 10 minutes I can find it as well using /search.
 If I stop Solr using Ctrl+C or kill -9 (from my cygwin console) before the
 10 minutes have passed and starts Solr again then I can find the document
 using both /get and /search.

 Are there any scenarios where I will loose an indexed document before either
 commit or soft commit is trigged?
 And does the transaction log have anything to do with this...

 Thanks in advance.

 Best regards Trym



-- 
- Mark


Solr Cell Questions

2012-09-24 Thread Johannes . Schwendinger
Hi,

Im currently experimenting with Solr Cell to index files to Solr. During 
this some questions came up.

1. Is it possible (and wise) to connect to Solr Cell with multiple Threads 
at the same time to index several documents at the same time?
This question came up because my prrogramm takes about 6hours to index 
round 35000 docs. (no production environment, only example solr and a 
little desktop machine but I think its very slow, and I know solr isn't 
the bottleneck (yet)) 

2. If 1 is possible, how many Threads should do this and how many memory 
Solr needs? I've tried it but i run into an out of memory exception.

Thanks in advantage

Best Regards
Johannes

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Jack Krupansky
Run a query on both old and new with debugQuery=true on your query request and 
look at the component timings for possible insight.

-- Jack Krupansky

From: Sujatha Arun 
Sent: Monday, September 24, 2012 7:26 AM
To: solr-user@lucene.apache.org 
Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1

Hi, 

On migrating from 1.3 to 3.6.1  , I see the query performance degrading by 
nearly 2 times for all types of query.  Indexing performance slight degradation 
over 1.3 For Indexing we use our custom scripts that post xml over HTTP.

Any thing that I might have missed . I am thinking that this might be due to 
new Tiered MP over LogByteSize creating more segment files and hence more the 
Query latency .We are using Compound Files in 1.3  and I have set this to true 
even in 3.6.1 ,but results in more segement files

On optimizing the query response time improved beyond 1.3  .So could it be the 
MP or am i missing something here . Do let me know

Please find attached the solrconfig.xml

Regards
Sujatha

Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Mark Miller
Right - we need logs, admin-cloud dump to clipboard info, anything
else to go on.

On Mon, Sep 24, 2012 at 4:36 AM, Sami Siren ssi...@gmail.com wrote:
 hi,

 Can you share a little bit more about your configuration: how many
 shards, # of replicas, how does your clusterstate.json look like,
 anything suspicious in the logs?

 --
  Sami Siren

 On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge
 daniel.brue...@gmail.com wrote:
 Hi,

 I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
 so  that it wasn't reachable. CPU load was 100%.

 After a restart i couldn't access the data it just telled me:

 no servers hosting shard

 Is there a way to get the data back?

 Thanks  regards

 Daniel



-- 
- Mark


Re: Items disappearing from Solr index

2012-09-24 Thread Kissue Kissue
Hi Erick,

Thanks for your reply. Yes i am using delete by query. I am currently
logging the number of items to be deleted before handing off to solr. And
from solr logs i can it deleted exactly that number. I will verify further.

Thanks.

On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.comwrote:

 How do you delete items? By ID or by query?

 My guess is that one of two things is happening:
 1 your delete process is deleting too much data.
 2 your index process isn't indexing what you think.

 I'd add some logging to the SolrJ program to see what
 it thinks is has deleted or added to the index and go from there.

 Best
 Erick

 On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to
  index and delete items from solr.
 
  I basically index items from the db into solr every night. Existing items
  can be marked for deletion in the db and a delete request sent to solr to
  delete such items.
 
  My process runs as follows every night:
 
  1. Check if items have been marked for deletion and delete from solr. I
  commit and optimize after the entire solr deletion runs.
  2. Index any new items to solr. I commit and optimize after all the new
  items have been added.
 
  Recently i started noticing that huge chunks of items that have not been
  marked for deletion are disappearing from the index. I checked the solr
  logs and the logs indicate that it is deleting exactly the number of
 items
  requested but still a lot of other items disappear from the index from
 time
  to time. Any ideas what might be causing this or what i am doing wrong.
 
 
  Thanks.



solrcloud and csv import hangs

2012-09-24 Thread dan sutton
Hi,

This appears to happen in trunk too.

It appears that the add command request parameters get sent to the
nodes. If I comment these out like so for add and commit:

core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

-  params = new ModifiableSolrParams(req.getParams());
+  //params = new ModifiableSolrParams(req.getParams());
+  params = new ModifiableSolrParams();

This things work as expected.

Otherwise params like stream.url gets sent to the replicant nodes
which causes failure if the file is missing, or worse repeatedly
importing the same file if exists on a replicant.

This might not be the right thing to do? ... what should be sent here
for a streaming CSV import?

Dan


On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

 Cheers,
 Dan


Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread jimtronic
I'm not sure if this will be relevant for you, but this is roughly what I do.
Apologies if it's too basic. 

I have a complex view that normalizes all the data that I need to be
together -- from over a dozen different tables. For one to many and many to
many relationships, I have sql turn the data into a comma delimited string
which the data import handler and the RegexTransformer will split into a
multi-valued field.

So, you might have a schema like this:

id123/id
name_sJohn Smith/name_s
attr_products
  strpython/str
  strjava/str
  strjavascript/str
/attr_products

Often I've found that I don't really need to the data together into one solr
core and it works better to just create a separate core just for that
schema. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4009879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 This appears to happen in trunk too.

 It appears that the add command request parameters get sent to the
 nodes. If I comment these out like so for add and commit:

 core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

 -  params = new ModifiableSolrParams(req.getParams());
 +  //params = new ModifiableSolrParams(req.getParams());
 +  params = new ModifiableSolrParams();

 This things work as expected.

 Otherwise params like stream.url gets sent to the replicant nodes
 which causes failure if the file is missing, or worse repeatedly
 importing the same file if exists on a replicant.

 This might not be the right thing to do? ... what should be sent here
 for a streaming CSV import?

 Dan


 On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

Yikes! Thanks for investigating, this looks pretty serious.
Could you open a JIRA issue for this bug?

-Yonik
http://lucidworks.com


At a high level how does faceting in SolrCloud work?

2012-09-24 Thread Jamie Johnson
I'd like to wrap my head around how faceting in SolrCloud works, does
Solr ask each shard for their maximum value and then use that to
determine what else should be asked for from other shards, or does it
ask for all values and do the aggregation on the requesting server?


Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread Jack Krupansky
Could supply some sample user queries and some sample data the queries 
should match? In other words, how do your users expect to view the data? 
If you are simply trying to replicate full SQL queries in Solr, you're 
probably going to be disappointed, but if you look at what queries your 
users are likely to want to enter, maybe it won't be so bad.


And maybe Solr's limited join capabilities might be sufficient to bridge 
the gap between a single flat schema and many relational tables.


http://wiki.apache.org/solr/Join

Join support is there, but don't leap before you think carefully.

-- Jack Krupansky

-Original Message- 
From: Thomas J. Brennan

Sent: Monday, September 24, 2012 10:23 AM
To: solr-user@lucene.apache.org
Subject: need best solution for indexing and searching multiple, related 
database tables






I have a requirement to search multiple, related database tables.  Sometimes
I need to join two tables, sometimes three or four and possibly more.  The
tables will generally store structured data related to individuals or
organizations.  This will include things like company, contact and address
tables and may include other child tables like products, assets, etc.  It is
something of a moving target.  Record counts are commonly in the tens of
millions but can be upwards of a few hundred million or even much more.

My understanding is that denormalization is most commonly the preferred
solution.  For two tables that is pretty straightforward.  For three or four
or more tables, or many to many relationships, and depending on the record
counts, this can generate a lot of redundant data, indexing time, etc.

Any information on the best way to design a single approach to this problem
or any options I might employ like faceted search, NoSQL (based on my
limited research I am guessing this is not a solution), etc. would be
greatly appreciated.

Answers that are terribly obvious, even to a newb, are a tiny bit annoying.
Things like you should test several scenarios or there is no one good
solution.  I really do appreciate any suggestions that would help me solve
this problem.

Biff

P.S. - I did search the existing posts and found some related topics but
nothing as specific as I was looking for.



Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jack Krupansky

1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex \p{Punct}: 
POSIX character classes (US-ASCII only), so if any of the punctuation is 
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has empty terms - normally the 
query parsers will ignore terms that analyze to zero tokens. Maybe your { 
is not an ASCII left brace code and is (apparently) unprintable in the 
parsed query. Or, maybe there is some encoding problem in the analyzer.


-- Jack Krupansky

-Original Message- 
From: Daisy

Sent: Monday, September 24, 2012 9:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Remove specific punctuation marks

I tried amp; and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,

str name=rawquerystring{/str
str name=querystring{/str
str name=parsedquerytext:/str
str name=parsedquery_toStringtext:/str

but still the numfound gives 1

result name=response numFound=1 start=0

and the highlight shows the result of punctuation mark
em{/em
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Problem indexing CSV files using post.jar with multivalue fields

2012-09-24 Thread rudywilkjr
Never fails.  Take the time to post this message, only to discover the answer
on my own a few minutes later.

The solution is to surround the -Durl value in double quotes.  For example:

java
-Durl=http://localhost:8983/solr/contacts/update/csv?f.address.split=truef.address.separator=%7C;
-Dtype=text/csv -jar post.jar contact_test.csv

works perfectly.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-indexing-CSV-files-using-post-jar-with-multivalue-fields-tp4009905p4009907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Thanks Jack.

so Qtime = Sum of all prepare components + sum of all process components -
Debug comp process/prepare time

In 3.6.1 the process part of Query component for the following query seems
to take  8 times more time?  anything missing? For most queries the process
part of the Querycomponent seem to take more time in 3.6.1


This is *3.6.1 *

response
lst name=responseHeaderint name=status0/int
*int name=QTime33/int*
lst name=params
str name=debugQueryon/str
str name=indenton/strstr name=start0/str
*str name=qdifferential AND equations AND has AND one AND solution/str
*
str name=rows10/strstr name=version2.2/str/lst/lst

*Debug Output*
*
*
str name=QParserLuceneQParser/str
lst name=timing
double name=time33.0/double
lst name=prepare

double name=time3.0/double
*lst name=org.apache.solr.handler.component.QueryComponentdouble
name=time3.0/double/lst*lst
name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.StatsComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time0.0/double

/lst/lstlst name=processdouble name=time30.0/double
*lst name=org.apache.solr.handler.component.QueryComponentdouble
name=time26.0/double/lst*
lst name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.StatsComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time4.0/double/lst

*Same query in solr 1.3*
*
*
lst name=responseHeader
int name=status0/int
*int name=QTime6/int*
lst name=params
str name=debugQueryon/strstr name=indenton/str
str name=start0/strstr name=qdifferential AND equations AND has
AND one AND solution/str
str name=rows10/strstr name=version2.2/str

Debug Info

lst name=timing
double name=time6.0/double
lst name=prepare
double name=time1.0/double
*lst name=org.apache.solr.handler.component.QueryComponentdouble
name=time1.0/double/lst*
lst name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time0.0/double/lst/lst

lst name=process
double name=time5.0/double
*lst name=org.apache.solr.handler.component.QueryComponentdouble
name=time3.0/double/lst*
lst name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time2.0/double



*On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky j...@basetechnology.comwrote:
*

 Run a query on both old and new with debugQuery=true on your query
 request and look at the component timings for possible insight.

 -- Jack Krupansky

 From: Sujatha Arun
 Sent: Monday, September 24, 2012 7:26 AM
 To: solr-user@lucene.apache.org
 Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1

 Hi,

 On migrating from 1.3 to 3.6.1  , I see the query performance degrading by
 nearly 2 times for all types of query.  Indexing performance slight
 degradation over 1.3 For Indexing we use our custom scripts that post xml
 over HTTP.

 Any thing that I might have missed . I am thinking that this might be due
 to new Tiered MP over LogByteSize creating more segment files and hence
 more the Query latency .We are using Compound Files in 1.3  and I have set
 this to true even in 3.6.1 ,but results in more segement files

 On optimizing the query response time improved beyond 1.3  .So could it be
 the MP or am i missing something here . Do let me know

 Please find attached the solrconfig.xml

 Regards
 Sujatha



Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jack Krupansky
I tried it and PRFF is indeed generating an empty token. I don't know how 
Lucene will index or query an empty term. I mean, what it should do. In 
any case, it is best to avoid them.


You should be using a charFilter to simply filter raw characters before 
tokenizing. So, try:


charFilter class=solr.PatternReplaceCharFilterFactory/

It has the same pattern and replacement attributes.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Monday, September 24, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Remove specific punctuation marks

1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex \p{Punct}:
POSIX character classes (US-ASCII only), so if any of the punctuation is
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has empty terms - normally the
query parsers will ignore terms that analyze to zero tokens. Maybe your {
is not an ASCII left brace code and is (apparently) unprintable in the
parsed query. Or, maybe there is some encoding problem in the analyzer.

-- Jack Krupansky

-Original Message- 
From: Daisy

Sent: Monday, September 24, 2012 9:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Remove specific punctuation marks

I tried amp; and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,

str name=rawquerystring{/str
str name=querystring{/str
str name=parsedquerytext:/str
str name=parsedquery_toStringtext:/str

but still the numfound gives 1

result name=response numFound=1 start=0

and the highlight shows the result of punctuation mark
em{/em
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file




--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
How could I know which query parser I am using?
Here is the part of my schema that I am using



fieldType name=text_ar class=solr.TextField
positionIncrementGap=100
  analyzer 
tokenizer class=solr.WhitespaceTokenizerFactory/   
filter class=solr.PatternReplaceFilterFactory pattern=(\()
replacement= replace=all/
  /analyzer
/fieldType

 
   field name=text type=text_ar indexed=true stored=true
termVectors=true multiValued=true/

As shown even if I tried to remove ( the same happened for parsed query
and for numFound.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Thanks. Finally it works using 

charFilter class=solr.PatternReplaceCharFilterFactory pattern=(\()
replacement= replace=all/ 

I wonder what is the reason for that, and what is the difference between the
filter and the charFilter?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009918.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jonathan Rochkind
When I do things like this and want to avoid empty tokens even though 
previous analysis might result in some--I just throw one of these at the 
end of my analysis chain:


!-- get rid of empty string tokens. max is required, although
 we don't really care. --
filter class=solr.LengthFilterFactory min=1 max=/

A charfilter to filter raw characters can certainly still result in an 
empty token, if an initial token was composed solely of chars you wanted 
to filter out!  In which case you probably want the token to be deleted 
entirely, not still there as an empty token. The above length filter is 
one way to do that, although unfortunately requires specifying a 'max' 
even though I didn't actually want to filter out on the high end, oh well.



On 9/24/2012 1:07 PM, Jack Krupansky wrote:

I tried it and PRFF is indeed generating an empty token. I don't know
how Lucene will index or query an empty term. I mean, what it should
do. In any case, it is best to avoid them.

You should be using a charFilter to simply filter raw characters
before tokenizing. So, try:

charFilter class=solr.PatternReplaceCharFilterFactory/

It has the same pattern and replacement attributes.

-- Jack Krupansky

-Original Message- From: Jack Krupansky
Sent: Monday, September 24, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Remove specific punctuation marks

1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex \p{Punct}:
POSIX character classes (US-ASCII only), so if any of the punctuation is
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has empty terms - normally the
query parsers will ignore terms that analyze to zero tokens. Maybe your {
is not an ASCII left brace code and is (apparently) unprintable in the
parsed query. Or, maybe there is some encoding problem in the analyzer.

-- Jack Krupansky

-Original Message- From: Daisy
Sent: Monday, September 24, 2012 9:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Remove specific punctuation marks

I tried amp; and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,

str name=rawquerystring{/str
str name=querystring{/str
str name=parsedquerytext:/str
str name=parsedquery_toStringtext:/str

but still the numfound gives 1

result name=response numFound=1 start=0

and the highlight shows the result of punctuation mark
em{/em
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file




--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Walter Underwood
I've had problems with empty tokens. You can remove those with this as a step 
in the analyzer chain.

filter class=solr.LengthFilterFactory min=1 max=1024/

wunder

On Sep 24, 2012, at 10:07 AM, Jack Krupansky wrote:

 I tried it and PRFF is indeed generating an empty token. I don't know how 
 Lucene will index or query an empty term. I mean, what it should do. In any 
 case, it is best to avoid them.
 
 You should be using a charFilter to simply filter raw characters before 
 tokenizing. So, try:
 
 charFilter class=solr.PatternReplaceCharFilterFactory/
 
 It has the same pattern and replacement attributes.
 
 -- Jack Krupansky
 
 -Original Message- From: Jack Krupansky
 Sent: Monday, September 24, 2012 12:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr - Remove specific punctuation marks
 
 1. Which query parser are you using?
 2. I see the following comment in the Java 6 doc for regex \p{Punct}:
 POSIX character classes (US-ASCII only), so if any of the punctuation is
 some higher Unicode character code, it won't be matched/removed.
 3. It seems very odd that the parsed query has empty terms - normally the
 query parsers will ignore terms that analyze to zero tokens. Maybe your {
 is not an ASCII left brace code and is (apparently) unprintable in the
 parsed query. Or, maybe there is some encoding problem in the analyzer.
 
 -- Jack Krupansky
 
 -Original Message- From: Daisy
 Sent: Monday, September 24, 2012 9:26 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr - Remove specific punctuation marks
 
 I tried amp; and it solved the 500 error code. But still it could find
 punctuation marks.
 Although the parsed query didnt contain the punctuation mark,
 
 str name=rawquerystring{/str
 str name=querystring{/str
 str name=parsedquerytext:/str
 str name=parsedquery_toStringtext:/str
 
 but still the numfound gives 1
 
 result name=response numFound=1 start=0
 
 and the highlight shows the result of punctuation mark
 em{/em
 The steps I did:
 1- editing the schema
 2- restart the server
 3-delete the file
 4-index the file
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html
 Sent from the Solr - User mailing list archive at Nabble.com. 

--
Walter Underwood
wun...@wunderwood.org





Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Using solr.LengthFilterFactory was great and also solve the problem of
using PatternReplaceFilter. So now I have two solutions. Thanks all for
helping me. One thing I would like to know what is the diffrence between
PatternReplaceFilter and PatternReplaceCharFilter?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009925.html
Sent from the Solr - User mailing list archive at Nabble.com.


omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
I'm trying to configure per-field similarity to disregard term frequency 
(omitTf) in a 'title' field.   I'm trying to follow the example docs 
without success: my custom similarity doesn't seem to have any effect on 
'tf'.   Is the NoTfSimilarity function below written correctly?   Any 
advice is much appreciated.


my schema.xml:

field name=title type=text_custom_sim indexed=true stored=true 
omitNorms=true termVectors=true /


similarity class=solr.SchemaSimilarityFactory/
fieldType name=text_custom_sim class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer


NoTfSimilarityFactory.java:

   package com.ssww;

   import org.apache.lucene.search.similarities.Similarity;
   import org.apache.solr.schema.SimilarityFactory;

   public class NoTfSimilarityFactory extends SimilarityFactory {
  @Override
  public Similarity getSimilarity() {
return new NoTfSimilarity();
  }
   }


NoTfSimilarity.java:

   package com.ssww;
   import org.apache.lucene.search.similarities.DefaultSimilarity;

   public final class NoTfSimilarity extends DefaultSimilarity {
public float tf(int i) {
 return 1;
}

   }

These two files are in a jar in the lib directory of this core.   Here's 
the results of a search for paint with custom and default similarity:


Indexed with per-field NoTfSimilarity:

284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product 
of:
  280.5598 = (MATCH) sum of:
280.5598 = (MATCH) max of:
  280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of:
280.5598 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  39.83825 = queryWeight, product of:
8.0 = boost
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = queryNorm
  7.042474 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of:
18.217428 = score(doc=48,freq=1.0 = termFreq=1.0
), product of:
  4.268188 = queryWeight, product of:
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = queryNorm
  4.268188 = fieldWeight in 48, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of:
7.725952 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  1.6527361 = queryWeight, product of:
0.5 = boost
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = queryNorm
  4.6746435 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of:
106.50396 = score(doc=48,freq=4.0 = termFreq=4.0
), product of:
  16.317472 = queryWeight, product of:
5.0 = boost
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = queryNorm
  6.526989 = fieldWeight in 48, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = fieldNorm(doc=48)
 1.0142012 = 
scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0)


Indexed with DefaultSimilarity:

7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product 
of:
  7.524058 = (MATCH) sum of:
7.524058 = (MATCH) max of:
  7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], 
result of:
7.524058 = fieldWeight in 3504, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  5.3203125 = idf(docFreq=197, maxDocs=14892)
  1.0 = fieldNorm(doc=3504)
  0.5091842 = (MATCH) weight(search_keywords:paint in 3504) 
[DefaultSimilarity], result of:
0.5091842 = score(doc=3504,freq=1.0 = termFreq=1.0
), product of:
  

memory leak in pdfbox--SolrCel needs to call COSName.clearResources?

2012-09-24 Thread Kevin Goess
We've been struggling with solr hangs in the solr process that indexes
incoming PDF documents.  TLDR; summary is that I'm thinking that
PDFBox needs to have COSName.clearResources() called on it if the solr
indexer expects to be able to keep running indefinitely.  Is that
likely?  Is there anybody on this list who is doing PDF extraction in
a long-running process and having it work?

The thread dump of a hung process often shows lots of threads hanging on this:

java.lang.Thread.State: BLOCKED (on object monitor)
   at java.util.Collections$SynchronizedMap.get(Collections.java:1975)
   - waiting to lock 0x00072551f908 (a
java.util.Collections$SynchronizedMap)
   at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:68)
   at
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441)

And the heap is almost full:

Heap
   PSYoungGen  total 796416K, used 386330K
eden space 398208K, 97% used
from space 398208K, 0% used
to   space 398208K, 0% used
   PSOldGen
object space 2389376K, 99% used
   PSPermGen
object space 53824K, 99% used

Using eclipse's mat to look at the heap dump of a hung process shows
one of the chief memory leak suspects is PDFBox's COSName class

The class org.apache.pdfbox.cos.COSName, loaded by
java.net.FactoryURLClassLoader @ 0x725a1a230, occupies 151,183,360
(16.64%) bytes. The memory is accumulated in one instance of
java.util.concurrent.ConcurrentHashMap$Segment[] loaded by system
class loader.

and the Shortest Paths To the Accumulation Point graph for that
looks like this:

Class Name
  Shallow HeapRetained Heap

java.util.concurrent.ConcurrentHashMap$Segment[16]
 80151,160,680
segments java.util.concurrent.ConcurrentHashMap
48 151,160,728
nameMap class org.apache.pdfbox.cos.COSName
1,184   151,183,360
[123] java.lang.Object[2560]
 10,256 26,004,368
elementData java.util.Vector
 32 26,004,400
classes java.net.FactoryURLClassLoader
72 26,228,440
classloader class
org.apache.pdfbox.cos.COSDocument   8 8
   class
org.apache.pdfbox.cos.COSDocument 641,703,704
 referent java.lang.ref.Finalizer
40   1,703,744

And the Dominator Tree chart looks like this:

26.69% org.apache.solr.core.SolrCore
16.64% class org.apache.pdfbox.cos.COSName
2.89% java.net.Factory.URLClassLoader

Now the implementation of COSName says this:

 /**
  * Not usually needed except if resources need to be reclaimed in a long
  * running process.
  * Patch provided by fles...@gmail.com
  * incorporated 5/23/08, danielwil...@users.sourceforge.net
  */
 public static synchronized void clearResources()
 {
 // Clear them all
 nameMap.clear();
 }

I *don't* see a call to clearResources anywhere in solr or tika, and I
think that's the problem.  The implementation puts all the COSNames in
a class-level static HashMap, which never gets emptied, and apparently
keeps growing forever.  I suspect the fact that the URLClassLoader is
involved in that graph to the COSNames class is what's filling up the
PermGen space in the heap.

Does that sound likely? Possible?  Can anyone speak to that? Anyone
have suggested next steps for us, besides restarting our solr indexer
process every couple of hours?


Persisting dataimport.properties in ZooKeeper directory

2012-09-24 Thread balaji.gandhi
Hi,

We are working on a DIH for our project and we are persisting the
last_modified_date in the ZooKeeper directory. Our understanding is that the
properties are uploaded to ZooKeeper when the first SOLR node comes up. When
the SOLR nodes are restarted whatever is persisted in the properties is
lost.

Is there another way of maintaining state? Please let us know.

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Persisting-dataimport-properties-in-ZooKeeper-directory-tp4009965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solved: Re: omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
My problem was that I specified the per-field similarity class INSIDE 
the analyzer instead of outside it.


fieldType
analyzer
similarity
/fieldType

On 09/24/2012 02:56 PM, Carrie Coy wrote:
I'm trying to configure per-field similarity to disregard term 
frequency (omitTf) in a 'title' field.   I'm trying to follow the 
example docs without success: my custom similarity doesn't seem to 
have any effect on 'tf'.   Is the NoTfSimilarity function below 
written correctly?   Any advice is much appreciated.


my schema.xml:

field name=title type=text_custom_sim indexed=true 
stored=true omitNorms=true termVectors=true /


similarity class=solr.SchemaSimilarityFactory/
fieldType name=text_custom_sim class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer


NoTfSimilarityFactory.java:

   package com.ssww;

   import org.apache.lucene.search.similarities.Similarity;
   import org.apache.solr.schema.SimilarityFactory;

   public class NoTfSimilarityFactory extends SimilarityFactory {
  @Override
  public Similarity getSimilarity() {
return new NoTfSimilarity();
  }
   }


NoTfSimilarity.java:

   package com.ssww;
   import org.apache.lucene.search.similarities.DefaultSimilarity;

   public final class NoTfSimilarity extends DefaultSimilarity {
public float tf(int i) {
 return 1;
}

   }

These two files are in a jar in the lib directory of this core.   
Here's the results of a search for paint with custom and default 
similarity:


Indexed with per-field NoTfSimilarity:

284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), 
product of:

  280.5598 = (MATCH) sum of:
280.5598 = (MATCH) max of:
  280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of:
280.5598 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  39.83825 = queryWeight, product of:
8.0 = boost
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = queryNorm
  7.042474 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  18.217428 = (MATCH) weight(search_keywords:paint in 48) [], 
result of:

18.217428 = score(doc=48,freq=1.0 = termFreq=1.0
), product of:
  4.268188 = queryWeight, product of:
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = queryNorm
  4.268188 = fieldWeight in 48, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], 
result of:

7.725952 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  1.6527361 = queryWeight, product of:
0.5 = boost
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = queryNorm
  4.6746435 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of:
106.50396 = score(doc=48,freq=4.0 = termFreq=4.0
), product of:
  16.317472 = queryWeight, product of:
5.0 = boost
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = queryNorm
  6.526989 = fieldWeight in 48, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = fieldNorm(doc=48)
 1.0142012 = 
scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0)



Indexed with DefaultSimilarity:

7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), 
product of:

  7.524058 = (MATCH) sum of:
7.524058 = (MATCH) max of:
  7.524058 = (MATCH) weight(title:paint^8.0 in 3504) 
[DefaultSimilarity], result of:

7.524058 = fieldWeight in 3504, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  5.3203125 = idf(docFreq=197, maxDocs=14892)
  1.0 = 

Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
https://issues.apache.org/jira/browse/SOLR-3883

-Yonik
http://lucidworks.com


On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley yo...@lucidworks.com wrote:
 On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 This appears to happen in trunk too.

 It appears that the add command request parameters get sent to the
 nodes. If I comment these out like so for add and commit:

 core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

 -  params = new ModifiableSolrParams(req.getParams());
 +  //params = new ModifiableSolrParams(req.getParams());
 +  params = new ModifiableSolrParams();

 This things work as expected.

 Otherwise params like stream.url gets sent to the replicant nodes
 which causes failure if the file is missing, or worse repeatedly
 importing the same file if exists on a replicant.

 This might not be the right thing to do? ... what should be sent here
 for a streaming CSV import?

 Dan


 On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

 Yikes! Thanks for investigating, this looks pretty serious.
 Could you open a JIRA issue for this bug?

 -Yonik
 http://lucidworks.com


CQL instead of SQL in Solr data-config

2012-09-24 Thread PeterKerk
Please see this post here:
http://stackoverflow.com/questions/12324837/apache-cassandra-integration-with-apache-solr/12326329#comment16936430_12326329

Does anyone have experience with or know if it's possible with the Solr
data-config combined with Cassandra JDBC drivers
(http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/) to add CQL to
data-config instead of SQL and query Cassandra instead of a RDBMS?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CQL-instead-of-SQL-in-Solr-data-config-tp4009984.html
Sent from the Solr - User mailing list archive at Nabble.com.


/solr/dataimport not found

2012-09-24 Thread johnohod
I've been trying to set up Solr with Tomcat, in order to connect to a MySQL
database. I've got the admin page up, but I can't get
localhpst:8080/solr/dataimport/ to work. It returns a 404 errror.

Been googleing high and low, without finding the answer.

I've put this in my solrconfig.xml
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=configdata-config.xml/str
/lst
  /requestHandler

Created a data-config.xml in the same directory as the file above. This
should just connect to DB for now. And copied the JDBC-MYSQL connecter into
the /solr/lib directory.

Any suggestions would be much appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-dataimport-not-found-tp4009975.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: /solr/dataimport not found

2012-09-24 Thread Michael Della Bitta
Hello, John,

Assuming this is a single core instance of Solr, does
/solr/admin/dataimport.jsp work?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor | New York, NY  10017-6271
www.appinions.com
Where Influence Isn’t a Game


On Mon, Sep 24, 2012 at 5:11 PM, johnohod john-o...@tyde.no wrote:
 I've been trying to set up Solr with Tomcat, in order to connect to a MySQL
 database. I've got the admin page up, but I can't get
 localhpst:8080/solr/dataimport/ to work. It returns a 404 errror.

 Been googleing high and low, without finding the answer.

 I've put this in my solrconfig.xml
 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=configdata-config.xml/str
 /lst
   /requestHandler

 Created a data-config.xml in the same directory as the file above. This
 should just connect to DB for now. And copied the JDBC-MYSQL connecter into
 the /solr/lib directory.

 Any suggestions would be much appreciated.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-dataimport-not-found-tp4009975.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: /solr/dataimport not found

2012-09-24 Thread Chris Hostetter

: database. I've got the admin page up, but I can't get
: localhpst:8080/solr/dataimport/ to work. It returns a 404 errror.

1) which version of solr are you using?
2) did you try localhost:8080/solr/dataimport (no trailing slash) ?
3) does anything in the admin UI work?


-Hoss


Re: Range operator problems in Chef ( automating framework)

2012-09-24 Thread Erick Erickson
Be a little careful, spaces here can mess you up. Particularly
around the hyphen in -1hour. I.e.  NOW -1HOUR is invalid but
NOW-1HOUR is ok (note the space between W and -). There aren't
any in your example, but just to be sure

One other note: you may get better performance out of making this
a filter query fq, so it can be re-used, but you'll want to do some date
rounding, see: 
http://searchhub.org/dev/2012/02/23/date-math-now-and-filter-queries/

But one easy place to look at what eventually gets to Solr is the Solr logs,
the queries are all put in that file as they come in (along with a lot of
other stuff), so you have a chance to see whether what you're doing in Chef
is getting to Solr as you wish...

Best
Erick


On Mon, Sep 24, 2012 at 9:42 AM, Jack Krupansky j...@basetechnology.com wrote:
 That looks like a valid Solr date math expression, but you need to make sure
 that the field type is actually a Solr DateField as opposed to simply an
 integer Unix time value.

 -- Jack Krupansky

 -Original Message- From: Christian Bordis
 Sent: Monday, September 24, 2012 7:16 AM
 To: solr-user@lucene.apache.org
 Subject: Range operator problems in Chef ( automating framework)


 Hi Everyone!

 We doing some nice stuff with Chef
 (http://wiki.opscode.com/display/chef/Home).  It uses solr for search but
 range queries don't work as expected. Maybe chef, solr just buggy or I am
 doing it wrong ;-)

 In chef I have bunch of nodes witch timestamp attribute. Now want search
 nodes with have timestamp not older than on hour:

 search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *])

 Is this string in the call a solr compliant range expression at all?
 Unluckily,  I have no toys at hand to verify this myself at the moment...
 but I work on this.

 Thanks for reading! ^^

 Kind Regards,

 Christian Bordis


Re: Solr Cell Questions

2012-09-24 Thread Erick Erickson
If you're concerned about throughput, consider moving all the
SolrCell (Tika) processing off the server. SolrCell is way cool
for showing what can be done, but its downside is you're
moving all the processing of the structured documents to the
same machine doing the indexing. Pretty soon, especially
with significant size files, you're spending all your CPU cycles
parsing the files...

Happens there's a blog about this:
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/

By moving the indexing to N clients, you can increase
throughput until you make Solr work hard to do the indexing

Best
Erick

On Mon, Sep 24, 2012 at 10:04 AM,  johannes.schwendin...@blum.com wrote:
 Hi,

 Im currently experimenting with Solr Cell to index files to Solr. During
 this some questions came up.

 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
 at the same time to index several documents at the same time?
 This question came up because my prrogramm takes about 6hours to index
 round 35000 docs. (no production environment, only example solr and a
 little desktop machine but I think its very slow, and I know solr isn't
 the bottleneck (yet))

 2. If 1 is possible, how many Threads should do this and how many memory
 Solr needs? I've tried it but i run into an out of memory exception.

 Thanks in advantage

 Best Regards
 Johannes


Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Shawn Heisey

On 9/24/2012 11:37 AM, Daisy wrote:

One thing I would like to know what is the diffrence between
PatternReplaceFilter and PatternReplaceCharFilter?


The CharFilter version gets applied before anything else, including the 
Tokenizer.  The Filter version gets applied in the order specified in 
the schema file.  I would imagine that if you are allowed to specify 
multiple CharFilter entries (which I have never tested), they would be 
applied in the order they occur, all of them before the Tokenizer.


Thanks,
Shawn



Admin-UI: multiple facet

2012-09-24 Thread Alexandre Rafalovitch
Hello,

Is there a way to provide multiple facet field names in the Admin UI?
I have tried spaces, comas and simi-colons for no effect. Would have
been nice to be able to push the UI just a tiny bit further before
switching to the URL query string directly.

Or is single facet field a limitation of - otherwise excellent - new UI?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: memory leak in pdfbox--SolrCel needs to call COSName.clearResources?

2012-09-24 Thread Chris Hostetter

: We've been struggling with solr hangs in the solr process that indexes
: incoming PDF documents.  TLDR; summary is that I'm thinking that
: PDFBox needs to have COSName.clearResources() called on it if the solr
: indexer expects to be able to keep running indefinitely.  Is that

I don't know much about tika/pdfbox, but based on the details in your 
email i think you assessment is correct.

Solr (and SolrCell) doen't directly know about PDFBox at all -- that's all 
handled under the covers by Tika.  So i supsect you'd need to file a Jira 
with the Tika project to request that Tika somewhere/somehow call this 
COSName.clearResources() method when using PDFBox -- athough based on your 
description, i'm not sure when/where this owuld make sense.

Two workarrounds i can imagine:

1) if you do a SolrCore RELOAD all of the plugin classes will be 
reloaded in new ClassLoader (assuming you haven't embedded them directly 
in the solr.war, or asked your servlet container to load them for you) ... 
this would be marginally better then doing a full server restart.

2) if you are comfortable with java code, you could write a small 
RequestHandler that did nothing more then call COSName.clearResources() on 
each request -- you could then ping it on a regular basis, or register it 
as part of a newSearcher QuerySendEventListern to ensure it got called 
automaticly (or impelment SolrEventListenr directly and you could trigger 
it on ever commit).

3) heck: with the new ScriptUpdateProcessor in Solr 4.0, you could 
write some javascript in your solrconfig.xml that would call this method 
as part of the chains processCommit() method.

-Hoss


Re: SolrJ - IOException

2012-09-24 Thread roz dev
I have seen this happening

We retry and that works. Is your solr server stalled?

On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi
balaji.gan...@apollogrp.eduwrote:

 Hi,

 I am encountering this error randomly (under load) when posting to Solr
 using SolrJ.

 Has anyone encountered a similar error?

 org.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: http://localhost:8080/solr/profile at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at

 Thanks,
 Balaji



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
Greetings,

Is there a way to configure more graceful handling of field formatting
exceptions when indexing documents?

Currently, there is a field being generated in some documents that I
am indexing that is supposed to be a float but some times slips
through as an empty string. (I know, fix the docs, but sometimes bad
values slip through, and it would be nice to handle them in a more
forgiving manner).

Here's an example of the exception - when this happens, the entire doc
is thrown out due to the one malformed field:
---snip---
ERROR org.apache.solr.core.SolrCore -
org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error
adding field 'f_floatfield'=''
...
Caused by: java.lang.NumberFormatException: empty String

00:56:46,288 [SI] WARN  com.company.IndexerThread - BAD DOC:
a82a2f6a6a42ad3c98a05ddb3f2c382c
01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore -
org.apache.solr.common.SolrException: ERROR:
[doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field
'f_afloatfield'=''
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
at com.company.IndexerThread.run(IndexerThread.java:55)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NumberFormatException: empty String
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011)
at java.lang.Float.parseFloat(Float.java:452)
at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
... 12 more

01:02:12,713 [SI] WARN  com.company.IndexerThread - BAD DOC:
6ff90020f9ec0f6dd623e9879c3e024d
---snip---

In my thinking (and for this situation), it would be much better to
just ignore the malformed field and keep the doc - is there any way to
configure this or enable this behavior instead?

Thanks,
 Aaron


Re: How to more gracefully handle field format exceptions?

2012-09-24 Thread Otis Gospodnetic
Hi Aaron,

You could catch the error on the client, fix/clean/remove, and retry, no?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman daub...@gmail.com wrote:
 Greetings,

 Is there a way to configure more graceful handling of field formatting
 exceptions when indexing documents?

 Currently, there is a field being generated in some documents that I
 am indexing that is supposed to be a float but some times slips
 through as an empty string. (I know, fix the docs, but sometimes bad
 values slip through, and it would be nice to handle them in a more
 forgiving manner).

 Here's an example of the exception - when this happens, the entire doc
 is thrown out due to the one malformed field:
 ---snip---
 ERROR org.apache.solr.core.SolrCore -
 org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error
 adding field 'f_floatfield'=''
 ...
 Caused by: java.lang.NumberFormatException: empty String

 00:56:46,288 [SI] WARN  com.company.IndexerThread - BAD DOC:
 a82a2f6a6a42ad3c98a05ddb3f2c382c
 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore -
 org.apache.solr.common.SolrException: ERROR:
 [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field
 'f_afloatfield'=''
 at 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142)
 at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
 at com.company.IndexerThread.run(IndexerThread.java:55)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.NumberFormatException: empty String
 at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011)
 at java.lang.Float.parseFloat(Float.java:452)
 at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
 at 
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
 at 
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
 at 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
 ... 12 more

 01:02:12,713 [SI] WARN  com.company.IndexerThread - BAD DOC:
 6ff90020f9ec0f6dd623e9879c3e024d
 ---snip---

 In my thinking (and for this situation), it would be much better to
 just ignore the malformed field and keep the doc - is there any way to
 configure this or enable this behavior instead?

 Thanks,
  Aaron


Re: How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
Hi Otis,

I was just looking at how to implement that, but was hoping for a
cleaner method - it seems like I will have to actually parse the error
as text to find the field that caused it, then remove/mangle that
field and attempt re-adding the document - which seems less than
ideal.

I would think there would be a flag or an easy way to override the add
method that would just drop (or set to default value) any field that
didn't meet expectations.

Thanks for the suggestion,
 Aaron

On Mon, Sep 24, 2012 at 9:24 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi Aaron,

 You could catch the error on the client, fix/clean/remove, and retry, no?

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman daub...@gmail.com wrote:
 Greetings,

 Is there a way to configure more graceful handling of field formatting
 exceptions when indexing documents?

 Currently, there is a field being generated in some documents that I
 am indexing that is supposed to be a float but some times slips
 through as an empty string. (I know, fix the docs, but sometimes bad
 values slip through, and it would be nice to handle them in a more
 forgiving manner).

 Here's an example of the exception - when this happens, the entire doc
 is thrown out due to the one malformed field:
 ---snip---
 ERROR org.apache.solr.core.SolrCore -
 org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error
 adding field 'f_floatfield'=''
 ...
 Caused by: java.lang.NumberFormatException: empty String

 00:56:46,288 [SI] WARN  com.company.IndexerThread - BAD DOC:
 a82a2f6a6a42ad3c98a05ddb3f2c382c
 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore -
 org.apache.solr.common.SolrException: ERROR:
 [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field
 'f_afloatfield'=''
 at 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at 
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142)
 at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
 at com.company.IndexerThread.run(IndexerThread.java:55)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.NumberFormatException: empty String
 at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011)
 at java.lang.Float.parseFloat(Float.java:452)
 at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
 at 
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
 at 
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
 at 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
 ... 12 more

 01:02:12,713 [SI] WARN  com.company.IndexerThread - BAD DOC:
 6ff90020f9ec0f6dd623e9879c3e024d
 ---snip---

 In my thinking (and for this situation), it would be much better to
 just ignore the malformed field and keep the doc - is there any way to
 configure this or enable this behavior instead?

 Thanks,
  Aaron


How can I create about 100000 independent indexes in Solr?

2012-09-24 Thread 韦震宇
Dear all,
The company I'm working in have a website to server more than 10 
customers, and every customer should have it's own search cataegory. So I 
should create independent index for every customer.
The site http://wiki.apache.org/solr/MultipleIndexes give some solution to 
create multiple indexes.
I want to use multicore solution. But i'm afraid that Solr can't support so 
many indexes in this solution.
The other solution Flattening data into a single index is a choice, but i 
think it's best to keep all indexes indepent.  
Could you tell me how to create about 10 independent indexes in Solr?
Thank you all for reply!


Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-24 Thread sam fang
Hi Mark,

If can support in future, I think it's great. It's a really useful feature.
For example, user can use to refresh with totally new core. User can build
index on one core. After build done, can swap old core and new core. Then
get totally new core for search.

Also can used in the backup. If one crashed, can easily swap with backup
core and quickly serve the search request.

Best Regards,
Sam

On Sun, Sep 23, 2012 at 2:51 PM, Mark Miller markrmil...@gmail.com wrote:

 FYI swap is def not supported in SolrCloud right now - even though it may
 work, it's not been thought about and there are no tests.

 If you would like to see support, I'd add a JIRA issue along with any
 pertinent info from this thread about what the behavior needs to be changed
 to.

 - Mark

 On Sep 21, 2012, at 6:49 PM, sam fang sam.f...@gmail.com wrote:

  Hi Chris,
 
  Thanks for your help. Today I tried again and try to figure out the
 reason.
 
  1. set up an external zookeeper server.
 
  2. change /opt/solr/apache-solr-4.0.0-BETA/example/solr/solr.xml
 persistent
  to true. and run below command to upload config to zk. (renamed multicore
  to solr, and need to put zkcli.sh related jar package.)
  /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
  upconfig -confdir
 /opt/solr/apache-solr-4.0.0-BETA/example/solr/core0/conf/
  -confname
  core0 -z localhost:2181
  /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
  upconfig -confdir
 /opt/solr/apache-solr-4.0.0-BETA/example/solr/core1/conf/
  -confname
  core1 -z localhost:2181
 
  3. Start jetty server
  cd /opt/solr/apache-solr-4.0.0-BETA/example
  java -DzkHost=localhost:2181 -jar start.jar
 
  4. publish message to core0
  /opt/solr/apache-solr-4.0.0-BETA/example/solr/exampledocs
  cp ../../exampledocs/post.jar ./
  java -Durl=http://localhost:8983/solr/core0/update -jar post.jar
  ipod_video.xml
 
  5. query to core0 and core1 is ok.
 
  6. Click swap in the admin page, the query to core0 and core1 is
  changing. Previous I saw sometimes returns 0 result. sometimes return 1
  result. Today
  seems core0 still return 1 result, core1 return 0 result.
 
  7. Then click reload in the admin page, the query to core0 and core1.
  Sometimes return 1 result, and sometimes return nothing. Also can see
 the zk
  configuration also changed.
 
  8. Restart jetty server. If do the query, it's same as what I saw in
 step 7.
 
  9. Stop jetty server, then log into zkCli.sh, then run command set
  /clusterstate.json {}. then start jetty again. everything back to
 normal,
  that is what previous swap did in solr 3.6 or solr 4.0 w/o cloud.
 
 
  From my observation, after swap, seems it put shard information into
  actualShards, when user request to search, it will use all shard
  information to do the
  search. But user can't see zk update until click reload button in admin
  page. When restart web server, this shard information eventually went to
  zk, and
  the search go to all shards.
 
  I found there is a option distrib, and used url like 
  http://host1:18000/solr/core0/select?distrib=falseq=*%3A*wt=xml;, then
  only get the data on the
  core0. Digged in the code (handleRequestBody method in SearchHandler
 class,
  seems it make sense)
 
  I tried to stop tomcat server, then use command set /clusterstate.json
 {}
  to clean all cluster state, then use command cloud-scripts/zkcli.sh -cmd
  upconfig to upload config to zk server, and start tomcat server. It
  rebuild the right shard information in zk. then search function back to
  normal like what
  we saw in 3.6 or 4.0 w/o cloud.
 
  Seems solr always add shard information into zk.
 
  I tested cloud swap on single machine, if each core have one shard in the
  zk, after swap, eventually zk has 2 slices(shards) for that core because
  now only
  do the add. so the search will go to both 2 shards.
 
  and tested cloud swap with 2 machine which each core have 1 shard and 2
  slices. Below the configuration in the zk. After swap, eventually zk has
 4
  for that
  core. and search will mess up.
 
   core0:{shard1:{
   host1:18000_solr_core0:{
 shard:shard1,
 roles:null,
 leader:true,
 state:active,
 core:core0,
 collection:core0,
 node_name:host1:18000_solr,
 base_url:http://host1:18000/solr},
   host2:18000_solr_core0:{
 shard:shard1,
 roles:null,
 state:active,
 core:core0,
 collection:core0,
 node_name:host2:18000_solr,
 base_url:http://host2:18000/solr}}},
 
  For previous 2 cases, if I stoped tomcat/jetty server, then manullay
 upload
  configuration to zk, then start tomcat server, zk and search become
 normal.
 
  On Fri, Sep 21, 2012 at 3:34 PM, Chris Hostetter
  hossman_luc...@fucit.orgwrote:
 
 
  : Below is my solr.xml configuration, and already set persistent to
 true.
 ...
  : Then publish 1 record to test1, and query. it's ok now.
 
  Ok, first off -- 

UIMA for lemmatization

2012-09-24 Thread abhayd
hi 
I m new to UIMA. Solr doea not have lemmatization  component, i was thinking
of using UIMA  for this.

Is this a correct choice and if so how i would go about it any idea?

I see couple of links for solr uima integration but dont know how that can
be used for lemmatization

Any thoughts?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/UIMA-for-lemmatization-tp4010056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Any comments on this?



On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun suja.a...@gmail.com wrote:

 Thanks Jack.

 so Qtime = Sum of all prepare components + sum of all process components -
 Debug comp process/prepare time

 In 3.6.1 the process part of Query component for the following query seems
 to take  8 times more time?  anything missing? For most queries the process
 part of the Querycomponent seem to take more time in 3.6.1


 This is *3.6.1 *

 response
 lst name=responseHeaderint name=status0/int
 *int name=QTime33/int*
 lst name=params
 str name=debugQueryon/str
 str name=indenton/strstr name=start0/str
 *str name=qdifferential AND equations AND has AND one AND
 solution/str*
 str name=rows10/strstr name=version2.2/str/lst/lst

 *Debug Output*
 *
 *
 str name=QParserLuceneQParser/str
 lst name=timing
 double name=time33.0/double
 lst name=prepare

 double name=time3.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time3.0/double/lst*lst
 name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.StatsComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time0.0/double

 /lst/lstlst name=processdouble name=time30.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time26.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.StatsComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time4.0/double/lst

 *Same query in solr 1.3*
 *
 *
 lst name=responseHeader
 int name=status0/int
 *int name=QTime6/int*
 lst name=params
 str name=debugQueryon/strstr name=indenton/str
 str name=start0/strstr name=qdifferential AND equations AND has
 AND one AND solution/str
 str name=rows10/strstr name=version2.2/str

 Debug Info

 lst name=timing
 double name=time6.0/double
 lst name=prepare
 double name=time1.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time1.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time0.0/double/lst/lst

 lst name=process
 double name=time5.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time3.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time2.0/double



 *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky 
 j...@basetechnology.comwrote:
 *

 Run a query on both old and new with debugQuery=true on your query
 request and look at the component timings for possible insight.

 -- Jack Krupansky

 From: Sujatha Arun
 Sent: Monday, September 24, 2012 7:26 AM
 To: solr-user@lucene.apache.org
 Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1

 Hi,

 On migrating from 1.3 to 3.6.1  , I see the query performance degrading
 by nearly 2 times for all types of query.  Indexing performance slight
 degradation over 1.3 For Indexing we use our custom scripts that post xml
 over HTTP.

 Any thing that I might have missed . I am thinking that this might be due
 to new Tiered MP over LogByteSize creating more segment files and hence
 more the Query latency .We are using Compound Files in 1.3  and I have set
 this to true even in 3.6.1 ,but results in more segement files

 On optimizing the query response time improved beyond 1.3  .So could it
 be the MP or am i missing something here . Do let me know

 Please find attached the solrconfig.xml

 Regards
 Sujatha





Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Hi ,

Please comment on whether I should consider to move to the old Logbytesize
MP on moving to 3.6.1 from 1.3 ,as I see improvements in query performance
on optimization.

Just to mention we have a lot of indexes in multi cores as well as multiple
webapps and that's the reason we went for CFS in 1.3  to avoid the  too
many open file issue which we had encountered.

Regards
Sujatha

On Tue, Sep 25, 2012 at 9:55 AM, Sujatha Arun suja.a...@gmail.com wrote:

 Any comments on this?



 On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun suja.a...@gmail.comwrote:

 Thanks Jack.

 so Qtime = Sum of all prepare components + sum of all process components
 - Debug comp process/prepare time

 In 3.6.1 the process part of Query component for the following query
 seems to take  8 times more time?  anything missing? For most queries the
 process part of the Querycomponent seem to take more time in 3.6.1


 This is *3.6.1 *

 response
 lst name=responseHeaderint name=status0/int
 *int name=QTime33/int*
 lst name=params
 str name=debugQueryon/str
 str name=indenton/strstr name=start0/str
 *str name=qdifferential AND equations AND has AND one AND
 solution/str*
 str name=rows10/strstr name=version2.2/str/lst/lst

 *Debug Output*
 *
 *
 str name=QParserLuceneQParser/str
 lst name=timing
 double name=time33.0/double
 lst name=prepare

 double name=time3.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time3.0/double/lst*lst
 name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.StatsComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time0.0/double

 /lst/lstlst name=processdouble name=time30.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time26.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.StatsComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time4.0/double/lst

 *Same query in solr 1.3*
 *
 *
 lst name=responseHeader
 int name=status0/int
 *int name=QTime6/int*
 lst name=params
 str name=debugQueryon/strstr name=indenton/str
 str name=start0/strstr name=qdifferential AND equations AND has
 AND one AND solution/str
 str name=rows10/strstr name=version2.2/str

 Debug Info

 lst name=timing
 double name=time6.0/double
 lst name=prepare
 double name=time1.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time1.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time0.0/double/lst/lst

 lst name=process
 double name=time5.0/double
 *lst name=org.apache.solr.handler.component.QueryComponentdouble
 name=time3.0/double/lst*
 lst name=org.apache.solr.handler.component.FacetComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.HighlightComponentdouble
 name=time0.0/double/lstlst
 name=org.apache.solr.handler.component.DebugComponentdouble
 name=time2.0/double



 *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky j...@basetechnology.com
  wrote:*

 Run a query on both old and new with debugQuery=true on your query
 request and look at the component timings for possible insight.

 -- Jack Krupansky

 From: Sujatha Arun
 Sent: Monday, September 24, 2012 7:26 AM
 To: solr-user@lucene.apache.org
 Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1

 Hi,

 On migrating from 1.3 to 3.6.1  , I see the query performance degrading
 by nearly 2 times for all types of query.  Indexing performance slight
 degradation over 1.3 For Indexing we use our custom scripts that post xml
 over HTTP.

 Any thing that I might have missed . I am thinking that this might be
 due to new Tiered MP over LogByteSize creating more segment files and hence
 more the Query latency .We are using Compound Files in 1.3  and I have set
 this to true even in 3.6.1 ,but results in more segement files

 On optimizing the query response time improved beyond 1.3  .So could it
 be the MP or am i missing something here . Do let me know

 Please find attached the solrconfig.xml