date:20131028

Hi,

I'm an infant in Solr/Lucene family, just a couple of months old.

We are trying to find a way to combine words into a single compound word at
index and query time. E.g. if the document has sea bird in it, it should
be indexed as seabird and any query having sea bird in it should also look
for seabird not only in qf but also in pf, pf2, pf3 fields. Well, we are
using edismax query parser.

Our problem is not at index time, we have achieved it by writing our own
token filter, but at query time. Our token filter takes a dictionary in the
form of prefix,suffix in the file and keeps emitting regular and compound
tokens as it encounters them.

We configured our own filter at query time but figured that at query time
individual clauses like field:sea , field:bird etc are created first and
then sent to the analyzer. First of all, can someone please confirm if this
part of my understanding is correct? So, we are forced to emit sea and bird
as individual tokens because we are not getting them in sequence at all.

Is it possible to achieve this by other means than pre-processing query
before sending it to solr? Can a CharFilter be used instead, are they
applied before creating query clauses?

I can keep providing more details as necessary. This mail has already
crossed TL;DR limits for many :)

Parvesh Garg
http://www.zettata.com
+91 963 222 5540

Re: Compound words

One more thing, Is there a way to remove my accidentally sent phone number
in the signature from the previous mail? aarrrggghhh

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Thanks @Mark  @Erick

Should I create a JIRA issue for this ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4098020.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr search in case the first keyword are not index

2013-10-28 Thread dtphat

I have solve it.
Thanks.



-
Phat T. Dong
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-in-case-the-first-keyword-are-not-index-tp4097699p4098021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimal interval for soft commit

How do you add the documents to the index - one by one, batches of n ? When
do you do your commits ?
Because 8k docs per day is not a lot. Depending on the above, commiting with
softCommit=true might also be a solution.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: One of all shard stopping, all shards stop

When one of your shards dies, your index becomes incomplete. By default the
querying is distributed (on all shards - distrib=true) and if one of them
(shard X) is down, then you get an error stating that there are no servers
hosting shard X.

If the other shards are still up you can query them directly using
distrib=false but in the resultset you will only have documents from that
shard. So you would have to query every active shard individually and then
merge the results yourself.
If I'm wrong please correct me.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098024.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr For

You're describing two different entities: Job and Employee.
Since they are clearly different in any way you will need two different
cores with two different schemas.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-For-tp4097928p4098025.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data import handler with multi tables

2013-10-28 Thread dtphat

Hi,
I wanna to import many tables from MySQL. Assume that, I have two tables:
*** Tables 1: tbl_tableA(id, nameA) with data (1, A1), (2, A2), (3, A3).
*** Tables 2: tbl_tableB(id, nameB) with data (1, B1), (2, B2), (3, B3), (4,
B4), (5, B5).

I configure:
dataConfig
dataSource type=JdbcDataSource 
driver=com.mysql.jdbc.Driver 
url=jdbc:mysql://xx 
user=xxx password=xxx batchSize=1 /

    document name = atexpats6

entity name=tableA 
query=select * from tbl_tableA
field name=id column=id/
field name=nameA column=nameA /
/entity


entity name=tableB 
query=select * from tbl_tableB
field name=id column=id/
field name=nameA column=nameA /
/entity
    /document
/dataConfig

I define nameA, nameB in schema.xml and id is configured by
uniqueKeyid/uniqueKey

When I import data by
http://localhost:8983/solr/dataimport?command=full-import

It's successfull. But only data of tbl_tableB had indexed.

I think  because id is unique. When importing tbl_tableA import first,
tbl_tableB import after. tbl_tableB has id which the same id in tableA, so
only data of tableB had indexed with unique id.

Anyone can help me to configure data import handler that can index all data
of two (more) tables which have the same id in each table.

Thanks.



-
Phat T. Dong
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-import-handler-with-multi-tables-tp4098026.html
Sent from the Solr - User mailing list archive at Nabble.com.

error in suggester component in solr

2013-10-28 Thread anurag.sharma

I am working with solr auto complete functionality,I am using solr 4.50 to
build my application, and I am following this link as a reference.
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-td3998559i20.html

 My suggest component is something like this

  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str  
  str name=storeDirsuggest/str
  str name=fieldautocomplete_text/str
  bool name=exactMatchFirsttrue/bool
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
  str name=buildOnOptimizetrue/str
/lst
   lst name=spellchecker
  str name=namejarowinkler/str  
  str name=fieldlowerfilt/str  
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str 
 
  str name=spellcheckIndexDirspellchecker/str  
   /lst
 str name=queryAnalyzerFieldTypeedgytext/str  
  /searchComponent


but, I am getting the following error

*org.apache.solr.spelling.suggest.Suggester  – Loading stored lookup
data failed
java.io.FileNotFoundException:
/home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat
(No such file or directory)*

It says that some file are missing but the solr wiki suggester component
says it supports these lookupImpls --

*str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
  
*
Dont know what I am doing wrong. Any help will be deeply appreciated




--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-in-suggester-component-in-solr-tp4098028.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie to Solr

2013-10-28 Thread Mamta Alshi

Hi Alex,

I have been able to run a few simple queries with my own schema.xml and
data file. My concern now is that i'm able to run queries like

http://localhost:8983/solr/select/?q=*:*

http://localhost:8983/solr/select/?q=*:*facet=truefacet.field=Name

from the url

However, when I try to run them like this

*:*facet=truefacet.field=Name

from the query string text box it gives me error like undefined field *.

Any idea what is going wrong?

TIA




On Sun, Oct 27, 2013 at 1:28 PM, Mamta Alshi mamta.al...@gmail.com wrote:

 Hi Alex,

 That is what I am suspecting too. Trying to remove the other files from
 the exampledocs directory is not helping. After removing all files except
 the details.xml also the results still show me data from the other files
 but not my file.

 I am making changes to the same path which is displayed in Web Admin's
 dashboard.

 My last option will be to delete solr ,install it again and try.

 Thanks for your prompt response.


 On Sun, Oct 27, 2013 at 1:04 PM, Alexandre Rafalovitch arafa...@gmail.com
  wrote:

 Maybe your Solr instance is somehow using a different collection
 directory?

 In Web Admin's dashboard section, it shows the path to where it thinks the
 instance is. Does it match to what you expected?

 If it does, try deleting the core directory, restarting Solr and doing
 indexing again. Maybe you have some old stuff there accidentally.

 Regards,
Alex

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Sun, Oct 27, 2013 at 3:45 PM, Mamta Alshi mamta.al...@gmail.com
 wrote:

  Hi,
 
  On trying to create a new schema.xml it shows the schema from the solr
  console. I have created a new data file called details.xml and placed
 it in
  the folder exampledocs. I have indexed just this one file from the
 command
  prompt.
 
  However,on my solr console in my query string when I query *:* it does
 not
  show me the contents from details.xml.
  It shows me contents of some other data file.
 
  Am I missing out on something?
 
  TIA .
 
 
  On Tue, Oct 1, 2013 at 3:16 PM, Kishan Parmar kishan@gmail.com
  wrote:
 
   yes you have to create your own schema
   but in schema file you have to add your xml files field name in it
 like
   wise
   you can add your field name in it 
  
   or you can add  your filed in the default schema file
  
   whiithout schema you can not add your xml file to solr
  
   my schema is like this
  
  
 
 --
   ?xml version=1.0 encoding=UTF-8 ?
   schema name=example version=1.5
   fields
field name=No type=string indexed=true stored=true
   required=true multiValued=false /
field name=Name type=string indexed=true stored=true
   required=true multiValued=false /
field name=Address type=string indexed=true stored=true
   required=true multiValued=false /
field name=Mobile type=string indexed=true stored=true
   required=true multiValued=false /
   /fields
   uniqueKeyNo/uniqueKey
  
   types
  
 fieldType name=string class=solr.StrField
 sortMissingLast=true
  /
 fieldType name=int class=solr.TrieIntField precisionStep=0
   positionIncrementGap=0 /
   /types
   /schema
  
  
 
 -
  
   and my file is like this ,,.,.,.,.
  
  
  
 
 -
   add
   doc
   field name=No100120107088/field
   field name=Namekishan/field
   field name=Addressghatlodia/field
   field name=Mobile9510077394/field
   /doc
   /add
  
   Regards,
  
   Kishan Parmar
   Software Developer
   +91 95 100 77394
   Jay Shree Krishnaa !!
  
  
  
   On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:
  
Hi,
   
I want to know that if i have to fire some query through the Solr
  admin,
   do
i need to create a new schema.xml? Where do i place it incase iahve
 to
create a new one.
   
Incase i can edit the original schema.xml can there be two fields
 named
   id
in my schema.xml?
   
I desperately need help in running queries on the Solr admin which
 is
configured on a Tomcat server.
   
What all preparation will i need to do? Schema.xml any docs?
   
Any help will be highly appreciated.
   
Thanks,
Mamta
   
   
   
--
View this message in context:
http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
Sent from the Solr - User mailing list archive at

Re: Newbie to Solr

Put *:* in the q field
Then check the facet check box (look lower close to the Execute button) and
in the facet.field insert Name.
This should do the trick.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-10-28 Thread anurag.sharma

Hi ... I am trying to build autocomplete functionality using your post. But I
am getting the following error

*2577 [coreLoadExecutor-3-thread-1] WARN 
org.apache.solr.spelling.suggest.Suggester  – Loading stored lookup data
failed
java.io.FileNotFoundException:
/home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:137)
at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:116)
at
org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601)
at org.apache.solr.core.SolrCore.init(SolrCore.java:830)
at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
*

I am using solr 4.4. Is the suggester component still works in this version



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4098032.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Background merge errors with Solr 4.4.0 on Optimize call

For Tomcat, the Solr is often put into catalina.out
as a default, so the output might be there. You can
configure Solr to send the logs most anywhere you
please, but without some specific setup
on your part the log output just goes to the default
for the servlet.

I took a quick glance at the code but since the merges
are happening in the background, there's not much
context for where that error is thrown.

How much memory is there for the JVM? I'm grasping
at straws a bit...

Erick


On Sun, Oct 27, 2013 at 9:54 PM, Matthew Shapiro m...@mshapiro.net wrote:

 I am working at implementing solr to work as the search backend for our web
 system.  So far things have been going well, but today I made some schema
 changes and now things have broken.

 I updated the schema.xml file and reloaded the core (via the admin
 interface).  No errors were reported in the logs.

 I then pushed 100 records to be indexed.  A call to Commit afterwards
 seemed fine, however my next call for Optimize caused the following errors:

 java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]

 null:java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]


 Unfortunately, googling for background merge hit exception came up
 with 2 thing: a corrupt index or not enough free space.  The host
 machine that's hosting solr has 227 out of 229GB free (according to df
 -h), so that's not it.


 I then ran CheckIndex on the index, and got the following results:
 http://apaste.info/gmGU


 As someone who is new to solr and lucene, as far as I can tell this
 means my index is fine. So I am coming up at a loss. I'm somewhat sure
 that I could probably delete my data directory and rebuild it but I am
 more interested in finding out why is it having issues, what is the
 best way to fix it, and what is the best way to prevent it from
 happening when this goes into production.


 Does anyone have any advice that may help?


 As an aside, i do not have a stacktrace for you because the solr admin
 page isn't giving me one.  I tried looking in my logs file in my solr
 directory, but it does not contain any logs.  I opened up my
 ~/tomcat/lib/log4j.properties file and saw http://apaste.info/0rTL,
 which didnt really help me find log files.  Doing a 'find . | grep
 solr.log' didn't really help either.  Any help for finding log files
 (which may help find the actual cause of this) would also be
 appreciated.

Re: Newbie to Solr

2013-10-28 Thread Mamta Alshi

Hi Michael,

Thanks for the prompt response. Have a look at my attached admin user
interfaces.

I do not quite see the options you mention.


On Mon, Oct 28, 2013 at 2:18 PM, michael.boom my_sky...@yahoo.com wrote:

 Put *:* in the q field
 Then check the facet check box (look lower close to the Execute button) and
 in the facet.field insert Name.
 This should do the trick.



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie to Solr

2013-10-28 Thread Mamta Alshi

how do I get the solr admin web user interface?


On Mon, Oct 28, 2013 at 2:32 PM, Mamta Alshi mamta.al...@gmail.com wrote:

 Hi Michael,

 Thanks for the prompt response. Have a look at my attached admin user
 interfaces.

 I do not quite see the options you mention.


 On Mon, Oct 28, 2013 at 2:18 PM, michael.boom my_sky...@yahoo.com wrote:

 Put *:* in the q field
 Then check the facet check box (look lower close to the Execute button)
 and
 in the facet.field insert Name.
 This should do the trick.



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: When is/should qf different from pf?

The facetious answer is when phrases aren't important in the fields.
If you're doing a simple boolean match, adding phrase fields will add
expense, to no good purpose etc. Phrases on numeric
fields seems wrong.

FWIW,
Erick


On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote:

 Hi all,

 I have been using Solr for years but never really stopped to wonder:

 When using the dismax/edismax handler, when do you have the qf different
 from the pf?

 I have always set them to be the same (maybe different weights) but I was
 wondering if there is a situation where you would have a field in the qf
 not in the pf or vice versa.

 My understanding from the docs is that qf is a term-wise hard filter while
 pf is a phrase-wise boost of documents who made it past the qf filter.

 Thanks!
 Amit

Re: Solr Update URI is not found

This seems like a better question for the Nutch list. I see hadoop
in there, so unless you've specifically configured solr to use
the HDFS directory writer factory, this has to be coming from
someplace else. And there are map/reduce tasks in here.

BTW, it would be more helpful if you posted the URL that you
successfully queried Solr with... What is the /2 on the end for?
Do you use that when you query?

Best,
Erick


On Mon, Oct 28, 2013 at 2:37 AM, Bayu Widyasanyata
bwidyasany...@gmail.comwrote:

 On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote:

   request: http://localhost:8080/solr/update?wt=javabinversion=2
 
  I think this url is incorrect: there should be a core name between solr
  and update.
 

 I changed th SolrURL on crawl script's option to:

 ./bin/crawl urls/seed.txt TestCrawl
 http://localhost:8080/solr/mycollection/2

 And the result now is Bad Request.
 I will look for another misconfiguration things...

 =

 org.apache.solr.common.SolrException: Bad Request

 Bad Request

 request:
 http://localhost:8080/solr/mycollection/update?wt=javabinversion=2
 at

 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
 at

 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at

 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
 at
 org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
 at

 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
 at

 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
 at
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer:
 java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
 at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
 at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)



 --
 wassalam,
 [bayu]

Re: Newbie to Solr

I don't see the mentioned attachement.
Try using http://snag.gy/ to provide it.

As for where do you find it, the default is
http://localhost:8983/solr/collection1/query



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098041.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimal interval for soft commit

2013-10-28 Thread Mugoma Joseph O.

Hello,

 How do you add the documents to the index - one by one, batches of n ?

Documents are added one by one using solrj

 When do you do your commits ?

We have the following settings in solrconfig.xml:


 autoCommit
   maxTime180/maxTime
   openSearcherfalse/openSearcher
 /autoCommit


   autoSoftCommit
 maxTime15000/maxTime
   /autoSoftCommit



Thanks.

Mugoma.


On Mon, October 28, 2013 12:22 pm, michael.boom wrote:
 How do you add the documents to the index - one by one, batches of n ?
 When
 do you do your commits ?
 Because 8k docs per day is not a lot. Depending on the above, commiting
 with
 softCommit=true might also be a solution.



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Apache-Solr with Tomcat: displaying the format of search result

2013-10-28 Thread pyramesh

Hi All,

Recently I have integrated Apache solr with Tomcat server.everything is
working fine. I am displaying the search result using velocity template.

But Here is my problem. search results are displaying the correct format as
input data format.

For Example: input data (whole data contains in single field):: 

*issue*: description about issue.
*Solution*: Solution given user goes here.

but after index the data , the data displaying in the below format

in the search result :: *issue*: description about issue.*Solution*:
Solution given user goes here.

But this is not I want.. I want to display data as same as input format.

can anyone please help on this


Thanks in Advance ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-with-Tomcat-displaying-the-format-of-search-result-tp4098040.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Compound words

Why did you reject using synonyms? You can have multi-word
synonyms just fine at index time, and at query time, since the
multiple words are already substituted in the index you don't
need to do the same substitution, just query the raw strings.

I freely acknowledge you may have very good reasons for doing
this yourself, I'm just making sure you know what's already
there.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Look particularly at the explanations for sea biscuit in that section.

Best,
Erick



On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote:

 One more thing, Is there a way to remove my accidentally sent phone number
 in the signature from the previous mail? aarrrggghhh

Re: Optimal interval for soft commit

To reply to your original question, when you soft commit
the top-level caches are thrown away. I.e. the filterCache,
documentResultCache, all the ones in solrconfig.xml.

And if you have a high autowarm count on them, you wind
up doing a lot of work for no gain. Say your soft commit
interval is 1 second. Only queries that come in during that
one second even _potentially_ use the caches.

Here's a long blog with lots of background:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Try this:
1 set your soft commit interval to 1
2 set your cache sizes in solrconfig to 5
3 set your autowarm counts in 2 to 0.

try it. If you see unacceptable degradation in query performance,
then this is too aggressive and you need some caching.
If not, don't bother caching.

As always, it's a tradeoff between how fast docs are searchable
and how much you can improve things with caching.

Best,
Erick

On Mon, Oct 28, 2013 at 6:42 AM, Mugoma Joseph O. mug...@yengas.com wrote:

Hello,

How do you add the documents to the index - one by one, batches of n ?

Documents are added one by one using solrj

When do you do your commits ?

We have the following settings in solrconfig.xml:

autoCommit
maxTime180/maxTime
openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
maxTime15000/maxTime
/autoSoftCommit

Thanks.

Mugoma.

On Mon, October 28, 2013 12:22 pm, michael.boom wrote:
How do you add the documents to the index - one by one, batches of n ?
When
do you do your commits ?
Because 8k docs per day is not a lot. Depending on the above, commiting
with
softCommit=true might also be a solution.

-
Thanks,
Michael
--
View this message in context:

http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Update URI is not found

2013-10-28 Thread Bayu Widyasanyata

Hi Erick and All,

The problem is solved by copying schema-solr4.xml into my collection's Solr
conf (renamed to schema.xml).
I didn't use hadoop there, and apologize if it's better to post on this
Solr list since the problem appeared first on Solr Indexer step.

Regarding /2 option it's e-mail body evolution I thought :)
On my first posting, that was a crawl script syntax, as on my case:

# ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2

2 = the number of rounds.

See here:
http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script

Again, thanks everyone!


On Mon, Oct 28, 2013 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 This seems like a better question for the Nutch list. I see hadoop
 in there, so unless you've specifically configured solr to use
 the HDFS directory writer factory, this has to be coming from
 someplace else. And there are map/reduce tasks in here.

 BTW, it would be more helpful if you posted the URL that you
 successfully queried Solr with... What is the /2 on the end for?
 Do you use that when you query?

 Best,
 Erick


 On Mon, Oct 28, 2013 at 2:37 AM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote:
 
request: http://localhost:8080/solr/update?wt=javabinversion=2
  
   I think this url is incorrect: there should be a core name between
 solr
   and update.
  
 
  I changed th SolrURL on crawl script's option to:
 
  ./bin/crawl urls/seed.txt TestCrawl
  http://localhost:8080/solr/mycollection/2
 
  And the result now is Bad Request.
  I will look for another misconfiguration things...
 
  =
 
  org.apache.solr.common.SolrException: Bad Request
 
  Bad Request
 
  request:
  http://localhost:8080/solr/mycollection/update?wt=javabinversion=2
  at
 
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
  at
 
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
  at
 
 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
  at
 
 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
  at
  org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
  at
 
 
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
  at
 
 
 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
  at
  org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
  at
  org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
  2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer:
  java.io.IOException: Job failed!
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
  at
 org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
  at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at
 org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
 
 
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]

Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).

2013-10-28 Thread Sai Gadde

we have a similar error as this thread.

http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html

Tried tomcat setting from this post. We used exact setting sepecified
here. we merge 500 documents at a time. I am creating a new thread
because Michael is using Jetty where as we use Tomcat.


formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very
high value 2GB. As suggested in the following thread.
https://issues.apache.org/jira/browse/SOLR-5331


We use out of the box Solr 4.5.1 no customization done. If we merge
documents via SolrJ to a single server it is perfectly working fine.


 But as soon as we add another node to the cloud we are getting
following while merging documents.



This is the error we are getting on the server (10.10.10.116 - IP is
irrelavent just for clarity)where merging is happening. 10.10.10.119
is the new node here. This server gets RemoteSolrException


shard update error StdNode:
http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Illegal to have multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)





On the other server 10.10.10.119 we get following error


org.apache.solr.common.SolrException: Illegal to have multiple roots
(start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have
multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [1,12369]
at 
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
at 
com.ctc.wstx.sr.BasicStreamReader.handleExtraRoot(BasicStreamReader.java:2155)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2070)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2647)

Field Value depending on another field value

2013-10-28 Thread bengates

Hello,

I'm pretty new to Solr, and I have a question about best practice.

I want to handle a Solr collection with products that are available in
different shops.
For several reasons, the price of a product may be the same or vary,
depending the shop's location.

What I don't know how to handle correctly is the ability to have a price
that is a multivalued notion, which value depends on another field.

Imagine the following product into the collection :
{
id: 123456,
name: The Wonderful product,
SellableInShop: [1, 3],
Price: 0,
PriceInShop1: 34.99,
PriceInShop2: 0,
PriceInShop3: 38.99
}

Behaviour I want when the user searchs for wonderful after selecting the
shop #3
/query?q=wonderful AND SellableInShops:3

{
id: 123456,
name: The Wonderful product,
SellableInShop: [1, 3],
Price: 38.99
}

My question is : how to fill, at query-time, the content of a field Price,
depending on 2 other fields : SellableInShop and PriceInShop3 (PriceInShop2
if SellableInShop == 2, PriceInShop1 if SellableInShop == 1, etc) ?

Thanks a lot,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Value-depending-on-another-field-value-tp4098047.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import handler with multi tables

2013-10-28 Thread Stefan Matheis

 I think because id is unique. When importing tbl_tableA import first,
 tbl_tableB import after. tbl_tableB has id which the same id in tableA, so
 only data of tableB had indexed with unique id.
 
 

That's exactly what happens here :) If the second table would have fewer 
records than the first one, you'd still see records from that table.

 Anyone can help me to configure data import handler that can index all data
 of two (more) tables which have the same id in each table.
 
 

that requires the use of a key which is known as compound key 
(http://en.wikipedia.org/wiki/Compound_key), f.e. if data comes from Table A .. 
make it A1 instead of (only) 1, A2, B1, B2 .. and so on. you can still index 
the raw id's in another field .. but for the unique key .. you need something 
like that, to get it working.


HTH
Stefan



On Monday, October 28, 2013 at 10:45 AM, dtphat wrote:

 Hi,
 I wanna to import many tables from MySQL. Assume that, I have two tables:
 *** Tables 1: tbl_tableA(id, nameA) with data (1, A1), (2, A2), (3, A3).
 *** Tables 2: tbl_tableB(id, nameB) with data (1, B1), (2, B2), (3, B3), (4,
 B4), (5, B5).
 
 I configure:
 dataConfig
 dataSource type=JdbcDataSource 
 driver=com.mysql.jdbc.Driver 
 url=jdbc:mysql://xx 
 user=xxx password=xxx batchSize=1 /
 
 document name = atexpats6
 
 entity name=tableA 
 query=select * from tbl_tableA
 field name=id column=id/
 field name=nameA column=nameA /
 /entity
 
 
 entity name=tableB 
 query=select * from tbl_tableB
 field name=id column=id/
 field name=nameA column=nameA /
 /entity
 /document
 /dataConfig
 
 I define nameA, nameB in schema.xml and id is configured by
 uniqueKeyid/uniqueKey
 
 When I import data by
 http://localhost:8983/solr/dataimport?command=full-import
 
 It's successfull. But only data of tbl_tableB had indexed.
 
 I think because id is unique. When importing tbl_tableA import first,
 tbl_tableB import after. tbl_tableB has id which the same id in tableA, so
 only data of tableB had indexed with unique id.
 
 Anyone can help me to configure data import handler that can index all data
 of two (more) tables which have the same id in each table.
 
 Thanks.
 
 
 
 -
 Phat T. Dong
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Data-import-handler-with-multi-tables-tp4098026.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: Compound words

Hi Erick,

Thanks for the suggestion. Like I said, I'm an infant.

We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit =
sea biscuit and didn't understand exactly how it worked. But I just checked
the analysis tool, and it seems to work perfectly fine at index time. Now,
I can happily discard my own filter and 4 days of work. I'm happy I got to
know a few ways on how/when not to write a solr filter :)

I tried the string sea biscuit sea bird with expand=false and the tokens
i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at
query time, when I enter the same term sea biscuit sea bird, using
edismax and qf, pf2, and pf3, the parsedQuery looks like this:

+((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\)
(text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea
bird\))

What I wanted instead was this

+((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\)
(text:\sea bird\)) (text:\seabiscuit sea bird\)

Looks like there isn't any other way than to pre-process query myself and
create the compound word. What do you mean by just query the raw string?
Am I still missing something?

Parvesh Garg
http://www.zettata.com
(This time I did remove my phone number :) )

On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 Why did you reject using synonyms? You can have multi-word
 synonyms just fine at index time, and at query time, since the
 multiple words are already substituted in the index you don't
 need to do the same substitution, just query the raw strings.

 I freely acknowledge you may have very good reasons for doing
 this yourself, I'm just making sure you know what's already
 there.

 See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 Look particularly at the explanations for sea biscuit in that section.

 Best,
 Erick



 On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote:

  One more thing, Is there a way to remove my accidentally sent phone
 number
  in the signature from the previous mail? aarrrggghhh

Re: One of all shard stopping, all shards stop

2013-10-28 Thread hongkeun.yoo

Thanks for your reply. If one of server have stop and error, this
option(distrib=false) is good work. Similarly option is
shards.tolerant=true. but I don't want to using this option. because the
died server isn't show error message. only return not nothing data.

I want to show error message at died server, the other way normal server
work normally.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098053.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import handler with multi tables

2013-10-28 Thread dtphat

Hi,
is there no another way to import all data for this case instead Only the
way using compound key?
Thanks.



-
Phat T. Dong
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Data-import-handler-with-multi-tables-tp4098048p4098056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Compound words

Consider setting expand=true at index time. That
puts all the tokens in your index, and then you
may not need to have any synonym
processing at query time since all the variants will
already be in the index.

As it is, you've replaced the words in the original with
synonyms, essentially collapsed them down to a single
word and then you have to do something at query time
to get matches. If all the variants are in the index, you
shouldn't have to. That's what I meant by raw.

Best,
Erick


On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg parv...@zettata.com wrote:

 Hi Erick,

 Thanks for the suggestion. Like I said, I'm an infant.

 We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit =
 sea biscuit and didn't understand exactly how it worked. But I just checked
 the analysis tool, and it seems to work perfectly fine at index time. Now,
 I can happily discard my own filter and 4 days of work. I'm happy I got to
 know a few ways on how/when not to write a solr filter :)

 I tried the string sea biscuit sea bird with expand=false and the tokens
 i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at
 query time, when I enter the same term sea biscuit sea bird, using
 edismax and qf, pf2, and pf3, the parsedQuery looks like this:

 +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\)
 (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea
 bird\))

 What I wanted instead was this

 +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\)
 (text:\sea bird\)) (text:\seabiscuit sea bird\)

 Looks like there isn't any other way than to pre-process query myself and
 create the compound word. What do you mean by just query the raw string?
 Am I still missing something?

 Parvesh Garg
 http://www.zettata.com
 (This time I did remove my phone number :) )

 On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Why did you reject using synonyms? You can have multi-word
  synonyms just fine at index time, and at query time, since the
  multiple words are already substituted in the index you don't
  need to do the same substitution, just query the raw strings.
 
  I freely acknowledge you may have very good reasons for doing
  this yourself, I'm just making sure you know what's already
  there.
 
  See:
 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
  Look particularly at the explanations for sea biscuit in that section.
 
  Best,
  Erick
 
 
 
  On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com
 wrote:
 
   One more thing, Is there a way to remove my accidentally sent phone
  number
   in the signature from the previous mail? aarrrggghhh

Re: One of all shard stopping, all shards stop

I think if you set shards.tolerant=true you get information in the
return packet if a shard is completely down.

The other thing you can do is query the ZooKeeper cluster state
directly.

But I have to ask why you're not using a replica or two per shard.
That should provide automatic fail-over etc and make the necessity
of dealing with this case _much_ less frequent. Personally I'd put
more effort into making an always-up cluster than dealing with
when a single node goes down.

FWIW,
Erick


On Mon, Oct 28, 2013 at 8:10 AM, hongkeun.yoo hunter...@naver.com wrote:

 Thanks for your reply. If one of server have stop and error, this
 option(distrib=false) is good work. Similarly option is
 shards.tolerant=true. but I don't want to using this option. because the
 died server isn't show error message. only return not nothing data.

 I want to show error message at died server, the other way normal server
 work normally.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098053.html
 Sent from the Solr - User mailing list archive at Nabble.com.

return value from SolrJ client to php

2013-10-28 Thread Amit Aggarwal

Hello All,

I have a requirement where I have to conect to Solr using SolrJ client and
documents return by solr to SolrJ client have to returned to PHP.

I know its simple to get document from Solr to SolrJ
But how do I return documents from SolrJ to PHP ?


Thanks
Amit Aggarwal

Re: Field Value depending on another field value

2013-10-28 Thread Anshum Gupta

Hi Ben,

You can actually look at indexing single valued documents i.e. a different
one for every store and then group by on the product id.
Have a look at this presentation by Adrian Trenaman at the Lucene
Revolution earlier this year:

Presentation:
http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america
Video: http://www.youtube.com/watch?v=kJa-3PEc90g

Hope that helps you.



On Mon, Oct 28, 2013 at 5:06 PM, bengates benga...@aliceadsl.fr wrote:

 Hello,

 I'm pretty new to Solr, and I have a question about best practice.

 I want to handle a Solr collection with products that are available in
 different shops.
 For several reasons, the price of a product may be the same or vary,
 depending the shop's location.

 What I don't know how to handle correctly is the ability to have a price
 that is a multivalued notion, which value depends on another field.

 Imagine the following product into the collection :
 {
 id: 123456,
 name: The Wonderful product,
 SellableInShop: [1, 3],
 Price: 0,
 PriceInShop1: 34.99,
 PriceInShop2: 0,
 PriceInShop3: 38.99
 }

 Behaviour I want when the user searchs for wonderful after selecting the
 shop #3
 /query?q=wonderful AND SellableInShops:3

 {
 id: 123456,
 name: The Wonderful product,
 SellableInShop: [1, 3],
 Price: 38.99
 }

 My question is : how to fill, at query-time, the content of a field
 Price,
 depending on 2 other fields : SellableInShop and PriceInShop3 (PriceInShop2
 if SellableInShop == 2, PriceInShop1 if SellableInShop == 1, etc) ?

 Thanks a lot,
 Ben



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Field-Value-depending-on-another-field-value-tp4098047.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net

Re: return value from SolrJ client to php

2013-10-28 Thread Anshum Gupta

Hi Amit,

I haven't personally tried it, but have a look at the options listed here:
http://wiki.apache.org/solr/IntegratingSolr

Also, just check if the library you try is known to work with the version
of Solr you'd want to use.

Otherwise, how about just using a serialization library for apps in the 2
languages to talk to each other?




On Mon, Oct 28, 2013 at 7:03 PM, Amit Aggarwal amit.aggarwa...@gmail.comwrote:

 Hello All,

 I have a requirement where I have to conect to Solr using SolrJ client and
 documents return by solr to SolrJ client have to returned to PHP.

 I know its simple to get document from Solr to SolrJ
 But how do I return documents from SolrJ to PHP ?


 Thanks
 Amit Aggarwal




-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Proposal for new feature, cold replicas, brainstorming

2013-10-28 Thread Toke Eskildsen

On Sat, 2013-10-26 at 02:14 +0200, Chris Hostetter wrote:
 I suspect that the most straight forward way to achieve what you are 
 folks seem to be describing would be to add a hook into the request 
 distribution processing so that you could have a custom plugin used when 
 solr does Replica r = pickReplica(shardName) and your implimentation of 
 pickReplica() would look something like (all psuedo code)...
 
   ListReplica allInShard = clusterState.getAllLiveReplicas(shardName)
   ListReplica candidates = new List();
   for (Replica r : allInShard) {
 if (! r.hasRole(shardIsLastResort) ) {
   candaites.add(r);
 }
   return candaidates.isEmpty() ? allInShard : candidates;

I am not vary familiar with the distribution code in Solr. I located
CloudSolrServer.request(SolrRequest request) which seems to be the place
you are talking about? It extracts replica URLs and generates a
LBHttpSolrServer.Req with that list, which it immediately used with the
LBHttpSolrServer.

As I understand it, feeding LBHttpSolrServer.Req with only shards that
are primary, would mean an exception if those shards does not answer. In
order to handle the first search against a failed primary shard
gracefully, wouldn't we need to extend the LBHttpSolrServer.Req to have
two lists, primary and lastResort, instead of one? This would also
require a rewrite of the try-retry logic in LBHttpSolrServer.

 ...if i remember correctly, there is already a hook (or there is an issue 
 about adding a hook) to let you do plugin logic like this -- [...]

I did not see one in the code and could not locate a JIRA issue. Not
that it means that it isn't there.

Thank you for your time,
Toke Eskildsen

Re: Compound words

2013-10-28 Thread Roman Chyla

Hi Parvesh,
I think you should check the following jira
https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
to other possible solutions/problems:-)
Roman
On 28 Oct 2013 09:06, Erick Erickson erickerick...@gmail.com wrote:

 Consider setting expand=true at index time. That
 puts all the tokens in your index, and then you
 may not need to have any synonym
 processing at query time since all the variants will
 already be in the index.

 As it is, you've replaced the words in the original with
 synonyms, essentially collapsed them down to a single
 word and then you have to do something at query time
 to get matches. If all the variants are in the index, you
 shouldn't have to. That's what I meant by raw.

 Best,
 Erick


 On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg parv...@zettata.com wrote:

  Hi Erick,
 
  Thanks for the suggestion. Like I said, I'm an infant.
 
  We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit =
  sea biscuit and didn't understand exactly how it worked. But I just
 checked
  the analysis tool, and it seems to work perfectly fine at index time.
 Now,
  I can happily discard my own filter and 4 days of work. I'm happy I got
 to
  know a few ways on how/when not to write a solr filter :)
 
  I tried the string sea biscuit sea bird with expand=false and the
 tokens
  i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But
 at
  query time, when I enter the same term sea biscuit sea bird, using
  edismax and qf, pf2, and pf3, the parsedQuery looks like this:
 
  +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit
 sea\)
  (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea
  bird\))
 
  What I wanted instead was this
 
  +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\)
  (text:\sea bird\)) (text:\seabiscuit sea bird\)
 
  Looks like there isn't any other way than to pre-process query myself and
  create the compound word. What do you mean by just query the raw
 string?
  Am I still missing something?
 
  Parvesh Garg
  http://www.zettata.com
  (This time I did remove my phone number :) )
 
  On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Why did you reject using synonyms? You can have multi-word
   synonyms just fine at index time, and at query time, since the
   multiple words are already substituted in the index you don't
   need to do the same substitution, just query the raw strings.
  
   I freely acknowledge you may have very good reasons for doing
   this yourself, I'm just making sure you know what's already
   there.
  
   See:
  
  
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
  
   Look particularly at the explanations for sea biscuit in that
 section.
  
   Best,
   Erick
  
  
  
   On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com
  wrote:
  
One more thing, Is there a way to remove my accidentally sent phone
   number
in the signature from the previous mail? aarrrggghhh

Re: Solr - what's the next big thing?

2013-10-28 Thread Otis Gospodnetic

Hi,

On Sun, Oct 27, 2013 at 2:57 PM, Saar Carmi saarca...@gmail.com wrote:
 If I get it right, Solr can store its data files on HDFS but it will not

Correct.
And can be used to build indices in parallel, using MapReduce, from
data living on HDFS.

 use map reduce to process the data (e.g. evaluating queries).

Right. MapReduce jobs are typically not a sub-second process, while
search queries typically need to be very quick.
That said, one could run a query and then apply MapReduce-based
processing on the search results.  There is no support for that in
Solr today.

 I was wondering whether Solr could utilize the Hadoop job distribution
 mechanism to utlize resources better.
 On the otherhand, maybe this is not needed with the availability of Solr
 Cloud.

Maybe you are thinking Solr on YARN?

Mark Miller can probably say a word or two or three on this topic.

 Bill Bell, could you elaborate about complex object indexing?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 On Sat, Oct 26, 2013 at 10:04 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi,

 On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi saarca...@gmail.com wrote:
  LOL,  Jack.  I can imagine Otis saying that.

 Funny indeed, but not really.

  Otis,  with these marriage,  are we going to see map reduce based
 queries?

 Can you please describe what you mean by that?  Maybe with an example.

 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/



  On Oct 25, 2013 10:03 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
  But a lot of that big yellow elephant stuff is in 4.x anyway.
 
  (Otis: I was afraid that you were going to say that the next big thing
 in
  Solr is... Elasticsearch!)
 
  -- Jack Krupansky
 
  -Original Message- From: Otis Gospodnetic
  Sent: Friday, October 25, 2013 2:43 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr - what's the next big thing?
 
  Saar,
 
  The marriage with the big yellow elephant is a big deal. It changes the
  scale.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Oct 25, 2013 5:32 AM, Saar Carmi saarca...@gmail.com wrote:
 
   If I am not mistaken the most impressive improvement of Solr 4.0
 compared
  to previous versions was the Solr Cloud architecture.
 
  What would be the next big thing in Solr 5.0 ?
 
  Saar
 
 
 




 --
 Saar Carmi

 Mobile: 054-7782417
 Email: saarca...@gmail.com

Re: Need idea to standardize keywords - ring tone vs ringtone

2013-10-28 Thread Developer

Thanks for your response Eric. Sorry for the confusion.

I currently display both 'ring tone' as well as 'ringtone' when the user
types in 'r' but I am trying to figure out a way to display just 'ringtone'
hence I added 'ring tone' to stopwords list so that it doesn't get indexed.

I have the list of know keywords (more like synonyms) which I am trying to
map against the user entered keywords.

ring tone, ringer tine = ringtone





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Replace document title with filename if it's empty

2013-10-28 Thread Bayu Widyasanyata

Hi,

I just found that some of PDFs files crawled has no (empty) 'title'
metadata.
How to define or fetch the filename, and use it (filename) replacing empty
'title' field?

I didn't found filename field on schema.xml, and don't know how to make
conditional for above conditions (if title is empty then ).

Thanks in advance.

-- 
wassalam,
[bayu]

Re: Apache-Solr with Tomcat: displaying the format of search result

2013-10-28 Thread Shawn Heisey

On 10/28/2013 4:40 AM, pyramesh wrote:
 But this is not I want.. I want to display data as same as input format.
 
 can anyone please help on this

What Solr outputs in its fields for search results is identical to what
it receives when data is indexed, unless you have update processors
configured that change the data.  The analysis chain that you define in
schema.xml is *NOT* applied to stored data, only indexed data.

If the search results are not coming out in the format that you want, it
is either arriving at Solr incorrectly, or you have one or more update
processors that are changing it.

Thanks,
Shawn

Re: Need idea to standardize keywords - ring tone vs ringtone

2013-10-28 Thread Jonathan Rochkind

Do you know about the Solr synonym feature?  That seems more applicable 
to what you're describing then stopwords. I'd stay away from stopwords 
entirely here, and try to do what you want with synonyms.


Multi-word synonyms can be tricky, I'm not entirely sure the right way 
to do it for this use case. But I think the synonym feature is what you 
want. Not the stopwords feature.




On 10/28/13 12:24 PM, Developer wrote:

Thanks for your response Eric. Sorry for the confusion.

I currently display both 'ring tone' as well as 'ringtone' when the user
types in 'r' but I am trying to figure out a way to display just 'ringtone'
hence I added 'ring tone' to stopwords list so that it doesn't get indexed.

I have the list of know keywords (more like synonyms) which I am trying to
map against the user entered keywords.

ring tone, ringer tine = ringtone





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When is/should qf different from pf?

2013-10-28 Thread Amit Nithian

Thanks Erick. Numeric fields make sense as I guess would strictly string
fields too since its one  term? In the normal text searching case though
does it make sense to have qf and pf differ?

Thanks
Amit
On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com wrote:

 The facetious answer is when phrases aren't important in the fields.
 If you're doing a simple boolean match, adding phrase fields will add
 expense, to no good purpose etc. Phrases on numeric
 fields seems wrong.

 FWIW,
 Erick


 On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote:

  Hi all,
 
  I have been using Solr for years but never really stopped to wonder:
 
  When using the dismax/edismax handler, when do you have the qf different
  from the pf?
 
  I have always set them to be the same (maybe different weights) but I was
  wondering if there is a situation where you would have a field in the qf
  not in the pf or vice versa.
 
  My understanding from the docs is that qf is a term-wise hard filter
 while
  pf is a phrase-wise boost of documents who made it past the qf filter.
 
  Thanks!
  Amit

Re: When is/should qf different from pf?

2013-10-28 Thread Upayavira

There'd be no point having them the same.

You're likely to include boosts in your pf, so that docs that match the
phrase query as well as the term query score higher than those that just
match the term query.

Such as:

  qf=text descriptionpf=text^2 description^4

Upayavira

On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote:
 Thanks Erick. Numeric fields make sense as I guess would strictly string
 fields too since its one  term? In the normal text searching case though
 does it make sense to have qf and pf differ?
 
 Thanks
 Amit
 On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  The facetious answer is when phrases aren't important in the fields.
  If you're doing a simple boolean match, adding phrase fields will add
  expense, to no good purpose etc. Phrases on numeric
  fields seems wrong.
 
  FWIW,
  Erick
 
 
  On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote:
 
   Hi all,
  
   I have been using Solr for years but never really stopped to wonder:
  
   When using the dismax/edismax handler, when do you have the qf different
   from the pf?
  
   I have always set them to be the same (maybe different weights) but I was
   wondering if there is a situation where you would have a field in the qf
   not in the pf or vice versa.
  
   My understanding from the docs is that qf is a term-wise hard filter
  while
   pf is a phrase-wise boost of documents who made it past the qf filter.
  
   Thanks!
   Amit

Solr block join

2013-10-28 Thread Simon

Hi,

The block join feature introduced in Solr 4.5 is really helpful in solving
some of the issues in my project. I am able to get it working in simple
cases. However, I couldn't figure out how to use it in some more complex
cases and I could find very little reference about it.
1) how to return both parent documents fields  and child document fields in
same result (in Solrj )?
2) how to apply 'OR' to multiple child documents types (searching for
documents that meet conditions of either child document type 1 or child
document type2)?
3) if result/sort/facet fields coming from child documents, how to define
them in schema? What I can think about is to create a copyField for each
them in parent documents. Is there any better way?
4) is block join working for multiple child level such child, grandchild
documents etc?

Does anyone have had similar issues and would like to share your solutions?

Thanks,
Simon



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-block-join-tp4098128.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).

2013-10-28 Thread Michael Tracey

Hey, this is Michael, who was having the exact error on the Jetty side with an 
update.  I've upgraded jetty from the 4.5.1 embedded version (in the example 
directory) to version 9.0.6, which means I had to upgrade my OpenJDK from 1.6 
to 1.7.0_45.  Also, I added the suggested (very large) settings to my 
solrconfig.xml: 

requestParsers enableRemoteStreaming=true formdataUploadLimitInKB=2048000 
multipartUploadLimitInKB=2048000 /

but I am still getting the errors when I put a second server in the cloud. 
Single servers (external zookeeper, but no cloud partner) works just fine.

I suppose my next step is to try Tomcat, but according to your post, it will 
not help!

Any help is appreciated,

M.

- Original Message -
From: Sai Gadde gadde@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, October 28, 2013 7:10:41 AM
Subject: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag 
in epilog?).

we have a similar error as this thread.

http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html

Tried tomcat setting from this post. We used exact setting sepecified
here. we merge 500 documents at a time. I am creating a new thread
because Michael is using Jetty where as we use Tomcat.


formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very
high value 2GB. As suggested in the following thread.
https://issues.apache.org/jira/browse/SOLR-5331


We use out of the box Solr 4.5.1 no customization done. If we merge
documents via SolrJ to a single server it is perfectly working fine.


 But as soon as we add another node to the cloud we are getting
following while merging documents.



This is the error we are getting on the server (10.10.10.116 - IP is
irrelavent just for clarity)where merging is happening. 10.10.10.119
is the new node here. This server gets RemoteSolrException


shard update error StdNode:
http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Illegal to have multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)





On the other server 10.10.10.119 we get following error


org.apache.solr.common.SolrException: Illegal to have multiple roots
(start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at

Re: Compound words