How to handle special characters in fuzzy search query

2015-05-07 Thread Madhav Bahuguna
So my solr query is implemented in two parts,first query does an exact
search if there are no results found for exact then it goes to the second
query that does a fuzzy search.
every things works fine but in situations like-->A user enters "burg +"
So in exact search no records will come,so second query is called to do a
fuzzy search.Now comes the problem my fuzzy query does not understand
special characters like +,-* which throws and error.If i dont pass special
characters it works fine.  But in real world a user can put characters with
their search,which will throw an error.
Now iam stuck in this and dont know how to resolve this issue.
This is how my exact search query looks like

$query1="(business_name:$data*^100 OR city_name:$data*^1 OR
locality_name:$data*^6 OR business_search_tag_name:$data*^8 OR
type_name:$data*^7) AND (business_active_flag:1) AND
(business_visible_flag:1) AND (delete_status_businessmasters:0)";

This is how my fuzzy query looks like


$query2='(_query_:%20"{!complexphrase%20qf=business_name^100+type_name^0.4+locality_name^6%27}%20'.$url_new.')AND(business_active_flag:1)AND(business_point:[1.5
TO 2.0])&q.op=AND&wt=json&indent=true';

Iam new to solr and dont know how to tackle this situation.

Details
Solrphpclient
php
solr 4.9


-- 
Regards
Madhav Bahuguna


Re: solr.war built from solr 4.7.2 not working

2015-05-07 Thread Rahul Singh
response inline.

On Thu, May 7, 2015 at 7:01 PM, Shawn Heisey  wrote:

> On 5/7/2015 3:43 AM, Rahul Singh wrote:
> >   I have tried to deploy solr.war from building it from 4.7.2 but it is
> > showing the below mentioned error. Has anyone faced the same? any lead
> > would also be appreciated.
> >
> > Error Message:
> >
> > {
> >   "responseHeader": {
> > "status": 500,
> > "QTime": 33
> >   },
> >   "error": {
> > "msg": "parsing error",
> > "trace":
> > "org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > parsing error
>
> Did you change the source code in any way before you compiled it?  You
> haven't said what you're actually doing that resulted in this error, or
> given any other details about your setup.  It's good that you've given
> us the full response with the error, but additional details, like the
> request that generated the error and any errors found in the Solr log,
> are important.
>
> just made few build files changes to include my jar for overriding lucene
default similarity.

logs showing following error...

ERROR - 2015-05-08 11:15:25.738; org.apache.solr.common.SolrException;
null:java.lang.IllegalArgumentException: You cannot set an index-time bo
ost on an unindexed field, or one that omits norms
at org.apache.lucene.document.Field.setBoost(Field.java:452)
at
org.apache.lucene.document.DocumentStoredFieldVisitor.stringField(DocumentStoredFieldVisitor.java:75)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:287)
at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
at
org.apache.lucene.index.IndexReader.document(IndexReader.java:446)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:659)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:147)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:174)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:87)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96)
at
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:51)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:749)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)

Because the error comes from HttpSolrServer and is embedded in a Solr
> response, I'm guessing this is a distributed request ... but I can't
> tell if it's SolrCloud or "manual" sharding.
>
its solr cloud sharding and it's is a solr cloud implementation with two
nodes both of them using the same war.


> With no other information to go on, I do have some possible ideas:
>
> You might have changed something fundamental in the source code that
> makes the distributed request incompatible with the target core/server.
>
> There might be mixed versions ... either multiple copies of jars on the
> classpath from different versions of Solr, or a version with your code
> changes trying to talk to another instance without your changes.
>
> Are there error messages in the Solr log on the instance of Solr that
> received the distributed request?
>
> Thanks,
> Sha

Re: Proximity searching in percentage

2015-05-07 Thread Zheng Lin Edwin Yeo
Thank you for the information.

I've currently using the fuzzy search and set the edit distance value to
~0.79, and this has allowed a 20% error rate. (ie for words with 5
characters, it allows 1 mis-spelled character, and for words with 10
characters, it allows 2 mis-speed characters).

However, for words with 4 characters, I'll need to set the value to ~0.75
to allow 1 mis-spelled character, as in order to accommodate 4 characters
word, it requires a 25% error rate for 1 mis-spelled character. We probably
will not accommodate for 3 characters word.

I've gotten the information from here:
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches


Just to check, will this affect the performance of the system?

Regards,
Edwin


On 7 May 2015 at 20:00, Alessandro Benedetti 
wrote:

> Hi !
> Currently Solr builds FST to provide proper fuzzy search or spellcheck
> suggestions based on the string distance .
> The current default algorithm is the Levenstein distance ( that returns the
> number of edit as distance metric).
> In your case you should calculate client side, the edit you want to apply
> to your search.
> In your client code, should be not difficult to process the query and apply
> the proper number of edit depending on the length.
>
> Anyway the max edit for the levenstein default distance is fixed to 2 .
>
> Cheers
>
>
>
> 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Hi,
> >
> > Would like to check, how do we implement character proximity searching
> > that's in terms of percentage with regards to the length of the word,
> > instead of a fixed number of edit distance (characters)?
> >
> > For example, if we have a proximity of 20%, a word with 5 characters will
> > have an edit distance of 1, and a word with 10 characters will
> > automatically have an edit distance of 2.
> >
> > Will Solr be able to do that for us?
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Not able to Add docValues in Solr

2015-05-07 Thread pras.venkatesh
Thanks for your reply, I was under the impression that clusterstate.json and
aliases.json are the two files that the zookeeper maintains.
Did not know that schema.xml is also maintained by zookeeper.Yes I am
running solrCloud. 
should I use the zookeeper client to re-upload?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-Add-docValues-in-Solr-tp4204405p4204425.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 4.8 and Java 8

2015-05-07 Thread Shawn Heisey
On 5/7/2015 4:36 PM, Vincenzo D'Amore wrote:
> may be this is a silly question, but just to be sure: is there any
> contraindication to run solrcloud with Java 8?

Solr 4.8 should work very well running on Java 8.  I would recommend the
latest Java 8 from Oracle, version 8u45.  There are supposed to be some
very nice improvements in garbage collection starting in 8u40.

Thanks,
Shawn



Re: SolrCloud 4.8 and Java 8

2015-05-07 Thread Erick Erickson
that should be fine. In fact, the current trunk (eventual 6.0)
_requires_ Java 8.

Best,
Erick

On Thu, May 7, 2015 at 3:36 PM, Vincenzo D'Amore  wrote:
> Hi all,
>
> may be this is a silly question, but just to be sure: is there any
> contraindication to run solrcloud with Java 8?
>
> Thanks for your patience,
> Vincenzo
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: SolrCloud indexing

2015-05-07 Thread Erick Erickson
bq: ...forwards the index notation to itself and any replicas...

That's just odd phrasing.

All that means is that the document sent through the indexing process
on the leader and all followers for a shard and
is indexed independently on each.

This is as opposed to the old master/slave situation where the master
indexed the doc, but the slave got the indexed
version as part of a segment when it replicated.

Could you add a comment to the CWiki calling the phrasing out? It
really is a bit mysterious.

Best,
Erick

On Thu, May 7, 2015 at 2:18 PM, Vincenzo D'Amore  wrote:
> Thanks Shawn.
>
> Just to make the picture more clear, I'm trying to understand why a 3 node
> solrcloud cluster and a old style solr server take same time to index same
> documents.
>
> But in the wiki is written:
>
> If the machine is a leader, SolrCloud determines which shard the document
>> should go to, forwards the document the leader for that shard, indexes the
>> document for this shard, and *forwards the index notation to itself and
>> any replicas*.
>
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
>
> Could you please explain what does it mean "forwards the index notation" ?
>
> On the other hand, on solrcloud I have 3 shards and 2 replicas for each
> shard. So, every node is indexing all the documents and this explains why
> solrcloud consumes same time compared to an old-style solr server.
>
>
>
> On Thu, May 7, 2015 at 3:08 PM, Shawn Heisey  wrote:
>
>> On 5/7/2015 3:04 AM, Vincenzo D'Amore wrote:
>> > Thanks Erick. I'm not sure I got your answer.
>> >
>> > I try to recap, when the raw document has to be indexed, it will be
>> > forwarded to shard leader. Shard leader indexes the document for that
>> > shard, and then forwards the indexed document to any replicas.
>> >
>> > I want just be sure that when the raw document is forwarded from the
>> leader
>> > to the replicas it will be indexed only one time on the shard leader.
>> From
>> > what I understand replicas do not indexes, only the leader indexes.
>>
>> The document is indexed by all replicas.  There is no way to forward the
>> indexed document, it can only forward the source document ... so each
>> replica must index it independently.
>>
>> The old-style master-slave replication (which existed long before
>> SolrCloud) copies the finished Lucene segments, so only the master
>> actually does indexing.
>>
>> SolrCloud doesn't have a master, only multiple replicas, one of which is
>> elected leader, and replication only comes into the picture if there's a
>> serious problem and Solr determines that it can't use the transaction
>> log to recover the index.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: solr 3.6.2 under tomcat 8 missing corename in path

2015-05-07 Thread Shawn Heisey
On 5/7/2015 11:03 AM, Tim Dunphy wrote:
> When I browse to /solr I see a link that points me to /solr/admin. And when
> I click on that link is when I see the error:
>
> *missing core name in path*
>
>
> I think what my problem is, is that I am not listing the cores correctly in
> the solr.xml file.
>
> This is what I have in my solr.xml file:
>
>  
>  
>
> 
>
> So what I did was create a directory at solr/admin/cores and put
> collection1 there:
>
> [root@aoadbld00032la solr]# ls -ld admin/cores/collection1
> drwxr-xr-x. 5 root root 4096 May  6 17:29 admin/cores/collection1
>
> So, if I assume correctly, that the way I reference the collection1
> directory is the problem, how can I express this differently in my solr.xml
> file so that it works?

The adminPath (/admin/cores) is a *URL* path, not a *filesystem* path.

Let's back up a bit.

Your tomcat config needs some way to start Solr.  This can either be
done by dropping "solr.war" into an automatic deployment directory, or
placing an XML file (context fragment) telling Tomcat how to find the
.war, the context path (usually /solr) and possibly some other settings.

https://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat

Part of the config that you can give is a directory known as the Solr
Home.  If you don't specify it, it defaults to "./solr" -- meaning a
directory named "solr" in the current working directory of the process
that actually starts your container.  I highly recommend that you
specify the solr home, either with "-Dsolr.solr.home=/path/to/solr" on
the java commandline, or with a JNI environment variable named
"solr/home" in the context fragment.

Once you have your Solr Home figured out, you put your solr.xml file
there, and your instanceDir for each of your cores will be relative to
that directory as well.  Inside the instanceDir you must have a conf
directory, which contains solrconfig.xml and schema.xml, plus any other
config files referenced by those two files.

If you're running SolrCloud, then the config is in zookeeper, not on the
disk in the conf directory, but the instanceDir for each core works much
the same other than that.

I would recommend you have persistent set to true in your solrconfig.xml
file.  A great number of possible problems with the CoreAdmin API can be
avoided that way.

Thanks,
Shawn



SolrCloud 4.8 and Java 8

2015-05-07 Thread Vincenzo D'Amore
Hi all,

may be this is a silly question, but just to be sure: is there any
contraindication to run solrcloud with Java 8?

Thanks for your patience,
Vincenzo

-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Not able to Add docValues in Solr

2015-05-07 Thread Shawn Heisey
On 5/7/2015 3:51 PM, pras.venkatesh wrote:
> I am trying to enable Doc Values for certain fields in Solr schema which
> helps with memory management while sorting the results and here is
> configuration I have 
>
>  omitNorms="true" docValues="true" />
>
> And field type defined this way.
>
> 
>
> I deployed this schema and started the server, but I don't see the
> schema.xml reflecting the docvalues from the Solr admin UI. 
>
> When I check the schema.xml by going to collection Name -->Files, I just see
> the below entry. 
>
>  omitNorms="true" />
>
> And, I did not find the need to re-index, the existing index is working just
> fine. So its clear that the docValues that I am trying to enforce is not
> taking any effect. 

The schema that you edited is probably not the active schema for that
index.  Are you running SolrCloud?  If you are, then the config will be
installed in zookeeper, requiring that you re-upload the config to
zookeeper to overwrite it with your changes.

If you're NOT running zookeeper (SolrCloud), then the following will apply:

Assuming that you are running 4.x or 5.x, if you select the core from
the dropdown and look at the "Overview" tab, you should see the
"instance" directory in the upper right corner.  Inside that directory
will be a "conf" directory with your schema and config inside it.  That
is what you will need to edit, then you need to reload the core or
restart Solr.

Thanks,
Shawn



Not able to Add docValues in Solr

2015-05-07 Thread pras.venkatesh
I am trying to enable Doc Values for certain fields in Solr schema which
helps with memory management while sorting the results and here is
configuration I have 



And field type defined this way.



I deployed this schema and started the server, but I don't see the
schema.xml reflecting the docvalues from the Solr admin UI. 

When I check the schema.xml by going to collection Name -->Files, I just see
the below entry. 



And, I did not find the need to re-index, the existing index is working just
fine. So its clear that the docValues that I am trying to enforce is not
taking any effect. 
Please advice what I am missing here.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-Add-docValues-in-Solr-tp4204405.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JSON Facet & Analytics API in Solr 5.1

2015-05-07 Thread Frank li
Is there any book to read so I won't ask such dummy questions? Thanks.

On Thu, May 7, 2015 at 2:32 PM, Frank li  wrote:

> This one does not have problem, but how do I include "sort" in this facet
> query. Basically, I want to write a solr query which can sort the facet
> count ascending. Something like "http://localhost:8983/solr
> /demo/query?q=apple&json.facet={field=price sort='count asc'}
> 
>
> I really appreciate your help.
>
> Frank
>
>
> 
>
> On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley  wrote:
>
>> On Thu, May 7, 2015 at 4:47 PM, Frank li  wrote:
>> > Hi Yonik,
>> >
>> > I am reading your blog. It is helpful. One question for you, for
>> following
>> > example,
>> >
>> > curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&
>> >  json.facet={
>> >categories:{
>> >  type : terms,
>> >  field : cat,
>> >  sort : { x : desc},
>> >  facet:{
>> >x : "avg(price)",
>> >y : "sum(price)"
>> >  }
>> >}
>> >  }
>> > '
>> >
>> >
>> > If I want to write it in the format of this:
>> >
>> http://localhost:8983/solr/query?q=apple&json.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
>> ,
>> > how do I do?
>>
>> What problems do you encounter when you try that?
>>
>> If you try that URL with curl, be aware that curly braces {} are
>> special globbing characters in curl.  Turn them off with the "-g"
>> option:
>>
>> curl -g "
>> http://localhost:8983/solr/demo/query?q=apple&json.facet={x:'avg(price)'}
>> "
>>
>> -Yonik
>>
>
>


Re: JSON Facet & Analytics API in Solr 5.1

2015-05-07 Thread Frank li
This one does not have problem, but how do I include "sort" in this facet
query. Basically, I want to write a solr query which can sort the facet
count ascending. Something like "http://localhost:8983/solr
/demo/query?q=apple&json.facet={field=price sort='count asc'}


I really appreciate your help.

Frank



On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley  wrote:

> On Thu, May 7, 2015 at 4:47 PM, Frank li  wrote:
> > Hi Yonik,
> >
> > I am reading your blog. It is helpful. One question for you, for
> following
> > example,
> >
> > curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&
> >  json.facet={
> >categories:{
> >  type : terms,
> >  field : cat,
> >  sort : { x : desc},
> >  facet:{
> >x : "avg(price)",
> >y : "sum(price)"
> >  }
> >}
> >  }
> > '
> >
> >
> > If I want to write it in the format of this:
> >
> http://localhost:8983/solr/query?q=apple&json.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
> ,
> > how do I do?
>
> What problems do you encounter when you try that?
>
> If you try that URL with curl, be aware that curly braces {} are
> special globbing characters in curl.  Turn them off with the "-g"
> option:
>
> curl -g "
> http://localhost:8983/solr/demo/query?q=apple&json.facet={x:'avg(price)'}"
>
> -Yonik
>


Re: JSON Facet & Analytics API in Solr 5.1

2015-05-07 Thread Yonik Seeley
On Thu, May 7, 2015 at 4:47 PM, Frank li  wrote:
> Hi Yonik,
>
> I am reading your blog. It is helpful. One question for you, for following
> example,
>
> curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&
>  json.facet={
>categories:{
>  type : terms,
>  field : cat,
>  sort : { x : desc},
>  facet:{
>x : "avg(price)",
>y : "sum(price)"
>  }
>}
>  }
> '
>
>
> If I want to write it in the format of this:
> http://localhost:8983/solr/query?q=apple&json.facet={x:'avg(campaign_ult_defendant_cnt_is)'},
> how do I do?

What problems do you encounter when you try that?

If you try that URL with curl, be aware that curly braces {} are
special globbing characters in curl.  Turn them off with the "-g"
option:

curl -g 
"http://localhost:8983/solr/demo/query?q=apple&json.facet={x:'avg(price)'}"

-Yonik


Re: SolrCloud indexing

2015-05-07 Thread Vincenzo D'Amore
Thanks Shawn.

Just to make the picture more clear, I'm trying to understand why a 3 node
solrcloud cluster and a old style solr server take same time to index same
documents.

But in the wiki is written:

If the machine is a leader, SolrCloud determines which shard the document
> should go to, forwards the document the leader for that shard, indexes the
> document for this shard, and *forwards the index notation to itself and
> any replicas*.


https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud


Could you please explain what does it mean "forwards the index notation" ?

On the other hand, on solrcloud I have 3 shards and 2 replicas for each
shard. So, every node is indexing all the documents and this explains why
solrcloud consumes same time compared to an old-style solr server.



On Thu, May 7, 2015 at 3:08 PM, Shawn Heisey  wrote:

> On 5/7/2015 3:04 AM, Vincenzo D'Amore wrote:
> > Thanks Erick. I'm not sure I got your answer.
> >
> > I try to recap, when the raw document has to be indexed, it will be
> > forwarded to shard leader. Shard leader indexes the document for that
> > shard, and then forwards the indexed document to any replicas.
> >
> > I want just be sure that when the raw document is forwarded from the
> leader
> > to the replicas it will be indexed only one time on the shard leader.
> From
> > what I understand replicas do not indexes, only the leader indexes.
>
> The document is indexed by all replicas.  There is no way to forward the
> indexed document, it can only forward the source document ... so each
> replica must index it independently.
>
> The old-style master-slave replication (which existed long before
> SolrCloud) copies the finished Lucene segments, so only the master
> actually does indexing.
>
> SolrCloud doesn't have a master, only multiple replicas, one of which is
> elected leader, and replication only comes into the picture if there's a
> serious problem and Solr determines that it can't use the transaction
> log to recover the index.
>
> Thanks,
> Shawn
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: JSON Facet & Analytics API in Solr 5.1

2015-05-07 Thread Frank li
Hi Yonik,

I am reading your blog. It is helpful. One question for you, for following
example,

curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&
 json.facet={
   categories:{
 type : terms,
 field : cat,
 sort : { x : desc},
 facet:{
   x : "avg(price)",
   y : "sum(price)"
 }
   }
 }
'


If I want to write it in the format of this:
http://localhost:8983/solr/query?q=apple&json.facet={x:'avg(campaign_ult_defendant_cnt_is)'},
how do I do?

Thanks,

Frank


On Mon, Apr 20, 2015 at 7:35 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Indeed - XML is not "human readable" if it contains colons, JSON is not
> "human readable" if it is too deep, and the objects/keys are not semantic.
> I also vote for flatter.
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Friday, April 17, 2015 11:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON Facet & Analytics API in Solr 5.1
>
> Flatter please.  The other nested stuff makes my head hurt.  Until
> recently I thought I was the only person on the planet who had a hard time
> mentally parsing anything but the simplest JSON, but then I learned that
> I'm not alone at all it's just that nobody is saying it. :)
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Fri, Apr 17, 2015 at 7:26 PM, Trey Grainger  wrote:
>
> > Agreed, I also prefer the second way. I find it more readible, less
> > verbose while communicating the same information, less confusing to
> > mentally parse ("is 'terms' the name of my facet, or the type of my
> > facet?..."), and less prone to syntactlcally valid, but logically
> > invalid inputs.  Let's break those topics down.
> >
> > *1) Less verbose while communicating the same information:* The
> > flatter structure is particularly useful when you have nested facets
> > to reduce unnecessary verbosity / extra levels. Let's contrast the two
> > approaches with just 2 levels of subfacets:
> >
> > ** Current Format **
> > top_genres:{
> > terms:{
> > field: genre,
> > limit: 5,
> > facet:{
> > top_authors:{
> > terms:{
> > field: author,
> > limit: 4,
> > facet: {
> > top_books:{
> > terms:{
> > field: title,
> > limit: 5
> >}
> >}
> > }
> > }
> > }
> > }
> > }
> > }
> >
> > ** Flat Format **
> > top_genres:{
> > type: terms,
> > field: genre,
> > limit: 5,
> > facet:{
> > top_authors:{
> > type: terms
> > field: author,
> > limit: 4,
> > facet: {
> > top_books:{
> > type: terms
> > field: title,
> > limit: 5
> >}
> > }
> > }
> > }
> > }
> >
> > The flat format is clearly shorter and more succinct, while
> > communicating the same information. What value do the extra levels add?
> >
> >
> > *2) Less confusing to mentally parse*
> > I also find the flatter structure less confusing, as I'm consistently
> > having to take a mental pause with the current format to verify
> > whether "terms" is the name of my facet or the type of my facet and
> > have to count the curly braces to figure this out.  Not that I would
> > name my facets like this, but to give an extreme example of why that
> > extra mental calculation is necessary due to the name of an attribute
> > in the structure being able to represent both a facet name and facet
> type:
> >
> > terms: {
> > terms: {
> > field: genre,
> > limit: 5,
> > facet: {
> > terms: {
> > terms:{
> > field: author
> > limit: 4
> > }
> > }
> > }
> > }
> > }
> >
> > In this example, the first "terms" is a facet name, the second "terms"
> > is a facet type, the third is a facet name, etc. Even if you don't
> > name your facets like this, it still requires parsing someone else's
> > query mentally to ensure that's not what was done.
> >
> > 3) *Less prone to syntactically valid, but logically invalid inputs*
> > Also, given this first format (where the type is indicated by one of
> > several possible attributes: terms, range, etc.), what happens if I
> > pass in multiple of the valid JSON attributes... the flatter structure
> > prevents this from being possible (which is a good thing!):
> >
> > top_authors : {
> > terms : {
> > field : author,
> > limit : 5
> > },
> > range : {
> > field : price,
> > start : 0,
> >  

Re: Getting error while building Solr

2015-05-07 Thread Erick Erickson
Upon occasion I've had to purge my ivy cache. On a Mac just 'rm -rf
~/.ivy2/cache'

brute force and you'll download a lot of jars when you compile, but
simple to test.

Erick

On Thu, May 7, 2015 at 11:56 AM, Aniket Kumar <008aniketku...@gmail.com> wrote:
> Hi ,
>
> I am getting below exception while running ant compile command.
>
> [ivy:retrieve] ERROR: impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  tried
> C:\Users\ani\.ivy2\local\net.sourceforge.nekohtml\nekohtml\1.9.17\ivys\ivy.xml
> [ivy:retrieve]  tried
> C:\Users\ani\.ivy2\local\net.sourceforge.nekohtml\nekohtml\1.9.17\jars\nekohtml.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve] main: Checking cache for: dependency:
> net.sourceforge.nekohtml#nekohtml;1.9.17 {compile=[master]}
> [ivy:retrieve] ERROR: impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve] ERROR: impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  tried
> C:\Users\ani\.ivy2\shared\net.sourceforge.nekohtml\nekohtml\1.9.17\ivys\ivy.xml
> [ivy:retrieve]  tried
> C:\Users\ani\.ivy2\shared\net.sourceforge.nekohtml\nekohtml\1.9.17\jars\nekohtml.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve] ERROR: impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  tried
> https://repo1.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
> [ivy:retrieve]  public: found md file for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  =>
> https://repo1.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
> (1.9.17)
>
>
> 
>
>
>
>
> [ivy:retrieve]   working-chinese-mirror: tried
> [ivy:retrieve]
> http://uk.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
> [ivy:retrieve]  ::
> [ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
> [ivy:retrieve]  ::
> [ivy:retrieve]  :: net.sourceforge.nekohtml#nekohtml;1.9.17: not
> found
> [ivy:retrieve]  ::
> [ivy:retrieve]
> [ivy:retrieve]  ERRORS
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]  impossible to acquire lock for
> net.sourceforge.nekohtml#nekohtml;1.9.17
> [ivy:retrieve]
> [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>[subant] Exiting
> G:\solar\workspace\SolrDevPro\lucene\benchmark\build.xml.
>[subant] Exiting G:\solar\workspace\SolrDevPro\lucene\build.xml.
>
> BUILD FAILED
> G:\solar\workspace\SolrDevPro\build.xml:147: The following error occurred
> while executing this line:
> G:\solar\workspace\SolrDevPro\lucene\build.xml:123: The following error
> occurred while executing this line:
> G:\solar\workspace\SolrDevPro\lucene\common-build.xml:2145: The following
> error occurred while executing this line:
> G:\solar\workspace\SolrDevPro\lucene\common-build.xml:414: impossible to
> resolve dependencies:
> resolve failed - see output for details
>
>
>
>
> I am not working under any firewall/proxy . I tried ant clean jar as well,
> but that did not work.
>
> Please suggest.
>
>
>
> Thanks,
> Aniket


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread O. Olson
Thank you Erick. I'm sorry I did not mention this earlier, but I am still on
Solr 4.10.3. Once I upgrade to Solr 5.0+ , I would consider your suggestion
in your blog post. 
O. O. 


Erick Erickson wrote
> Uh, you mean because I forgot to pate in the URL? Siih...
> 
> Anyway, the URL is irrelevant now that you've solved your problem, but
> in case you're interested:
> http://lucidworks.com/blog/solr-suggester/
> 
> Sorry for the confusion.
> Erick





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204392.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting error while building Solr

2015-05-07 Thread Aniket Kumar
Hi ,

I am getting below exception while running ant compile command.

[ivy:retrieve] ERROR: impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  tried
C:\Users\ani\.ivy2\local\net.sourceforge.nekohtml\nekohtml\1.9.17\ivys\ivy.xml
[ivy:retrieve]  tried
C:\Users\ani\.ivy2\local\net.sourceforge.nekohtml\nekohtml\1.9.17\jars\nekohtml.jar
[ivy:retrieve]  local: no ivy file nor artifact found for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve] main: Checking cache for: dependency:
net.sourceforge.nekohtml#nekohtml;1.9.17 {compile=[master]}
[ivy:retrieve] ERROR: impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve] ERROR: impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  tried
C:\Users\ani\.ivy2\shared\net.sourceforge.nekohtml\nekohtml\1.9.17\ivys\ivy.xml
[ivy:retrieve]  tried
C:\Users\ani\.ivy2\shared\net.sourceforge.nekohtml\nekohtml\1.9.17\jars\nekohtml.jar
[ivy:retrieve]  shared: no ivy file nor artifact found for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve] ERROR: impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  tried
https://repo1.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
[ivy:retrieve]  public: found md file for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  =>
https://repo1.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
(1.9.17)







[ivy:retrieve]   working-chinese-mirror: tried
[ivy:retrieve]
http://uk.maven.org/maven2/net/sourceforge/nekohtml/nekohtml/1.9.17/nekohtml-1.9.17.pom
[ivy:retrieve]  ::
[ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
[ivy:retrieve]  ::
[ivy:retrieve]  :: net.sourceforge.nekohtml#nekohtml;1.9.17: not
found
[ivy:retrieve]  ::
[ivy:retrieve]
[ivy:retrieve]  ERRORS
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]  impossible to acquire lock for
net.sourceforge.nekohtml#nekohtml;1.9.17
[ivy:retrieve]
[ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
   [subant] Exiting
G:\solar\workspace\SolrDevPro\lucene\benchmark\build.xml.
   [subant] Exiting G:\solar\workspace\SolrDevPro\lucene\build.xml.

BUILD FAILED
G:\solar\workspace\SolrDevPro\build.xml:147: The following error occurred
while executing this line:
G:\solar\workspace\SolrDevPro\lucene\build.xml:123: The following error
occurred while executing this line:
G:\solar\workspace\SolrDevPro\lucene\common-build.xml:2145: The following
error occurred while executing this line:
G:\solar\workspace\SolrDevPro\lucene\common-build.xml:414: impossible to
resolve dependencies:
resolve failed - see output for details




I am not working under any firewall/proxy . I tried ant clean jar as well,
but that did not work.

Please suggest.



Thanks,
Aniket


Re: oracle and java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels

2015-05-07 Thread Shawn Heisey
On 5/7/2015 11:59 AM, Siarhei Padolski wrote:
> I’m trying to import data using a read only account to an Oracle database.
> My data-config.xml is:
>
> 
> url="jdbc:oracle:oci8:@//localhost:10010:CONNECTION"
>  user="ATLAS_PANDABIGMON_R"
>  password="Lutini72"
>  readOnly="true"
>  autoCommit="false" batchSize="100"
>  />
>   
> 
>
>   
> 



> Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the 
> only valid transaction levels
> at 
> oracle.jdbc.driver.PhysicalConnection.setTransactionIsolation(PhysicalConnection.java:3301)
> at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.initializeConnection(JdbcDataSource.java:180)
> at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:158)
> ... 16 more
>

Line 180 of JdbcDataSource.java in current code is this:

  c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

This transaction isolation level is apparently not allowed by your
Oracle server.

That code line is only executed if the "readOnly" parameter is in the
config.  Try taking out that line in your DIH config file.  Because the
SQL statement in your DIH config is SELECT, enforcing readOnly won'tbe
required.  DIH will only ever use the SQL statements that you provide. 
If the Oracle user you've indicated only has read-only access, then
that's another reason to leave readOnly out.

You've included a password in the config you sent to the list.  You
might want to change that password on any system that contains it ... a
lot of people have now seen it, and it will live forever on the Internet
in several archives of this mailing list.

Thanks,
Shawn



Solr Multilingual Indexing with one field- Guidance

2015-05-07 Thread Kuntal Ganguly
Our current production index size is 1.5 TB with 3 shards. Currently we
have the following field type:














And the above field type is working well for the US and English language
clients.

Now we have some new Chinese and Japanese client ,so after google
http://www.basistech.com/indexing-strategies-for-multilingual-search-with-solr-and-rosette/

https://docs.lucidworks.com/display/lweug/Multilingual+Indexing+and+Search

for best approach for multilingual index,there seems to be pros/cons
associated with every approach.

Then i tried RnD with a single field approach and here's my new field type:


















I have kept the same tokenizer, only changed the filters.And it is working
well with all existing search /use-case for English documents as well as
new use case for Chinese/Japanese documents.

Now i have the following questions to the Solr experts/developer:

1) Is this a correct approach to do it? Or i'm missing something?

2) Can you give me an example where there will be problem with this above
new field type? A use-case/scenario with example will be very helpful.

3) Also is there any problem in future with different clients coming up?

Please provide some guidance


Fwd: Solr Multilingual Indexing with one field

2015-05-07 Thread Kuntal Ganguly
Our current production index size is 1.5 TB with 3 shards. Currently we
have the following field type:














And the above field type is working well for the US and English language
clients.

Now we have some new Chinese and Japanese client ,so after google
http://www.basistech.com/indexing-strategies-for-multilingual-search-with-solr-and-rosette/

https://docs.lucidworks.com/display/lweug/Multilingual+Indexing+and+Search

for best approach for multilingual index,there seems to be pros/cons
associated with every approach.

Then i tried RnD with a single field approach and here's my new field type:


















I have kept the same tokenizer, only changed the filters.And it is working
well with all existing search /use-case for English documents as well as
new use case for Chinese/Japanese documents.

Now i have the following questions to the Solr experts/developer:

1) Is this a correct approach to do it? Or i'm missing something?

2) Can you give me an example where there will be problem with this above
new field type? A use-case/scenario with example will be very helpful.

3) Also is there any problem in future with different clients coming up?

Please provide some guidance.


oracle and java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels

2015-05-07 Thread Siarhei Padolski
Hello,

I’m trying to import data using a read only account to an Oracle database.
My data-config.xml is:


  
  

   
  



I’m getting:


Exception while processing: job document : SolrInputDocument(fields: 
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Exception 
initializing SQL connection Processing Document # 1
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:166)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:133)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:402)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:44)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:270)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:240)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:44)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:72)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only 
valid transaction levels
at 
oracle.jdbc.driver.PhysicalConnection.setTransactionIsolation(PhysicalConnection.java:3301)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.initializeConnection(JdbcDataSource.java:180)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:158)
... 16 more




Could you help me with this issue?

Thanks a lot.
With best regards,
Sergey




Re: Negative Boosting documents with a certain word

2015-05-07 Thread Chris Hostetter

: Right now, I specify the boost for my request handler as:
: 
:   .
:   ln(qty)
:   
:  
: 
: Is there a way to specify this boost in the Solrconfig.xml?
: 
: I tried: (*:* -Refurbished)^10   and I get the
: following exception: 
: 
: ERROR - 2015-05-01 15:13:41.609; org.apache.solr.common.SolrException;
: org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
: Expected identifier at pos 0 str='(*:* -Refurbished)^10'


thta's because the "boost" option on the edismax parser expects a 
function, not a query...

https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

try adding a "bq" param...

  (*:* -Refurbished -foo -bar -baz)^10



-Hoss
http://www.lucidworks.com/


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread O. Olson
Thank you Rajesh, Alessandro and Erick. I apparently did not have much
knowledge about the Suggester - in fact I had no clue that there is a
difference between the SpellcheckComponent and the SuggestComponent. 

I would be reading about this, esp. Erick blog post on Lucidworks.

O. O. 


Rajesh Hazari wrote
> Good to know that its working as expected.
> 
> I have some couple of question on your autosuggest implementation.
> 
> I see that you are using SpellcheckComponent instead of SuggestComponent
> are you using this intentionally if not plz read this
>  https://cwiki.apache.org/confluence/display/solr/Suggester
> 
> I am working on an issue in suggester just sharing once again in this
> community just in-case if you are any others out have this in their list.
> 
> http://stackoverflow.com/questions/27847707/solr-autosuggest-to-stop-filter-suggesting-the-phrase-that-ends-with-stopwords
> 
> *thanks,*
> *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204356.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.6.2 under tomcat 8 missing corename in path

2015-05-07 Thread Tim Dunphy
Hi Shawn,


> The URL must include the core name.  Your defaultCoreName is
> collection1, and I'm guessing you don't have a core named collection1.
> Try browsing to just /solr instead of /solr/admin ... you should get a
> list of links for valid cores, each of which will take you to the admin
> page for that core.
> Probably what you will find is that when you click on one of those
> links, you will end up on /solr/corename/admin.jsp as the URL in your
> browser.


When I browse to /solr I see a link that points me to /solr/admin. And when
I click on that link is when I see the error:

*missing core name in path*


I think what my problem is, is that I am not listing the cores correctly in
the solr.xml file.

This is what I have in my solr.xml file:

  
 
   


So what I did was create a directory at solr/admin/cores and put
collection1 there:

[root@aoadbld00032la solr]# ls -ld admin/cores/collection1
drwxr-xr-x. 5 root root 4096 May  6 17:29 admin/cores/collection1

So, if I assume correctly, that the way I reference the collection1
directory is the problem, how can I express this differently in my solr.xml
file so that it works?

Thanks,
Tim



On Wed, May 6, 2015 at 8:00 PM, Shawn Heisey  wrote:

> On 5/6/2015 2:29 PM, Tim Dunphy wrote:
> > I'm trying to setup an old version of Solr for one of our drupal
> > developers. Apparently only versions 1.x or 3.x will work with the
> current
> > version of drupal.
> >
> > I'm setting up solr 3.4.2 under tomcat.
> >
> > And I'm getting this error when I start tomcat and surf to the
> /solr/admin
> > URL:
> >
> >  HTTP Status 404 - missing core name in path
> >
> > type Status report
> >
> > message missing core name in path
> >
> > description The requested resource is not available.
>
> The URL must include the core name.  Your defaultCoreName is
> collection1, and I'm guessing you don't have a core named collection1.
>
> Try browsing to just /solr instead of /solr/admin ... you should get a
> list of links for valid cores, each of which will take you to the admin
> page for that core.
>
> Probably what you will find is that when you click on one of those
> links, you will end up on /solr/corename/admin.jsp as the URL in your
> browser.
>
> Thanks,
> Shawn
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Daniel Collins
Jilani, you did say "My team needs that option if at all possible", my
first response would be "why?".   Why do they want to limit the number of
documents per shard, what's the rationale/use case behind that
requirement?  Once we understand that, we can explain why its a bad idea. :)

I suspect I'm re-iterating Jack's comments, but why are you sharding in the
first place? 8 shards split across 4 machines, so 2 shards per machine.
But you have 2 replicas of each shard, so you have 16 Solr core, and hence
4 Solr cores per machine?  Since you need an instance of all 8 shards to be
up in order to service requests, you can get away with everything on 2
machines, but you still have 8 Solr cores to manage in order to have a
fully functioning system.  What's the benefit of sharding in this
scenario?  Sharding adds complexity, so you normally only add sharding if
your search times are too slow without it.

You need to work out how much disk space the whole 20m docs is going to
take (maybe index 1m or 5m docs and extrapolate if they are all equivalent
in size), then split it across 4 machines.  But as Erick points out you
need to allow for merges to occur, so whatever the space of the "static"
data set, you need to allow for double that from time to time if background
merges are happening.


On 7 May 2015 at 16:05, Jack Krupansky  wrote:

> A leader is also a replica - SolrCloud is not a master/slave architecture.
> Any replica can be elected to be the leader, but that is only temporary and
> can change over time.
>
> You can place multiple shards on a single node, but was that really your
> intention?
>
> Generally, number of nodes equals number of shards times the replication
> factor. But then divided by shards per node if you do place more than one
> shard per node.
>
> -- Jack Krupansky
>
> On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik 
> wrote:
>
> > Hi,
> >
> > Is it possible to restrict number of documents per shard in Solr cloud?
> >
> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
> > leader and one replica. Like wise total we have 8 shards that includes
> > replicas. Now I need to index my documents in such a way that each shard
> > will have only 5 million documents. Total documents in Solr cloud should
> be
> > 20 million documents.
> >
> >
> > Thanks,
> > Jilani
> >
>


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread Erick Erickson
Uh, you mean because I forgot to pate in the URL? Siih...

Anyway, the URL is irrelevant now that you've solved your problem, but
in case you're interested:
http://lucidworks.com/blog/solr-suggester/

Sorry for the confusion.
Erick

On Thu, May 7, 2015 at 9:12 AM, Alessandro Benedetti
 wrote:
> When working with Suggesters I "suggest" to take a deep look to this guide :
> http://lucidworks.com/blog/solr-suggester/
>
> It was really helpful.
>
> Cheers
>
> 2015-05-07 16:58 GMT+01:00 Rajesh Hazari :
>
>> Good to know that its working as expected.
>>
>> I have some couple of question on your autosuggest implementation.
>>
>> I see that you are using SpellcheckComponent instead of SuggestComponent
>> are you using this intentionally if not plz read this
>>  https://cwiki.apache.org/confluence/display/solr/Suggester
>>
>> I am working on an issue in suggester just sharing once again in this
>> community just in-case if you are any others out have this in their list.
>>
>>
>> http://stackoverflow.com/questions/27847707/solr-autosuggest-to-stop-filter-suggesting-the-phrase-that-ends-with-stopwords
>>
>> *thanks,*
>> *Rajesh**.*
>>
>> On Thu, May 7, 2015 at 11:26 AM, O. Olson  wrote:
>>
>> > Thank you Erick. I have no clue what you are referring to when you used
>> to
>> > word "this"?  Are you referring to my question in my original
>> > email/message?
>> >
>> >
>> > Erick Erickson wrote
>> > > Have you seen this? I tried to make something end-to-end with assorted
>> > > "gotchas" identified
>> > >
>> > >  Best,
>> > > Erick
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204336.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


"I was asked to wait on state recovering for shard.... but I still do not see the request state"

2015-05-07 Thread adfel70
Hi
I have a cluster of 16 shards, 3 replicas.

I keep getting situations where a whole shard breaks.
the leader is at down state and says:
I was asked to wait on state recovering for shard but i still do not see
the requested state. I see state: recovering live:true leader from
ZK:http://...

the replicas are in recovering state keep failing on recovery, and putting
the same exception in the log.

any idea?

I use solr 4.10.3

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: getting frequent CorruptIndexException and inconsistent data though core is active

2015-05-07 Thread adfel70
Anyone has any inputs on this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-frequent-CorruptIndexException-and-inconsistent-data-though-core-is-active-tp4204129p4204347.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread Alessandro Benedetti
When working with Suggesters I "suggest" to take a deep look to this guide :
http://lucidworks.com/blog/solr-suggester/

It was really helpful.

Cheers

2015-05-07 16:58 GMT+01:00 Rajesh Hazari :

> Good to know that its working as expected.
>
> I have some couple of question on your autosuggest implementation.
>
> I see that you are using SpellcheckComponent instead of SuggestComponent
> are you using this intentionally if not plz read this
>  https://cwiki.apache.org/confluence/display/solr/Suggester
>
> I am working on an issue in suggester just sharing once again in this
> community just in-case if you are any others out have this in their list.
>
>
> http://stackoverflow.com/questions/27847707/solr-autosuggest-to-stop-filter-suggesting-the-phrase-that-ends-with-stopwords
>
> *thanks,*
> *Rajesh**.*
>
> On Thu, May 7, 2015 at 11:26 AM, O. Olson  wrote:
>
> > Thank you Erick. I have no clue what you are referring to when you used
> to
> > word "this"?  Are you referring to my question in my original
> > email/message?
> >
> >
> > Erick Erickson wrote
> > > Have you seen this? I tried to make something end-to-end with assorted
> > > "gotchas" identified
> > >
> > >  Best,
> > > Erick
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204336.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread Rajesh Hazari
Good to know that its working as expected.

I have some couple of question on your autosuggest implementation.

I see that you are using SpellcheckComponent instead of SuggestComponent
are you using this intentionally if not plz read this
 https://cwiki.apache.org/confluence/display/solr/Suggester

I am working on an issue in suggester just sharing once again in this
community just in-case if you are any others out have this in their list.

http://stackoverflow.com/questions/27847707/solr-autosuggest-to-stop-filter-suggesting-the-phrase-that-ends-with-stopwords

*thanks,*
*Rajesh**.*

On Thu, May 7, 2015 at 11:26 AM, O. Olson  wrote:

> Thank you Erick. I have no clue what you are referring to when you used to
> word "this"?  Are you referring to my question in my original
> email/message?
>
>
> Erick Erickson wrote
> > Have you seen this? I tried to make something end-to-end with assorted
> > "gotchas" identified
> >
> >  Best,
> > Erick
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204336.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Specify HTTP instead of AJP on tomcat

2015-05-07 Thread Shawn Heisey
On 5/7/2015 9:24 AM, Aki Balogh wrote:
> I'm seeing the following error while indexing:
>
> May 06, 2015 10:52:32 PM org.apache.jk.common.MsgAjp processHeader
> SEVERE: BAD packet signature 18245
> May 06, 2015 10:52:32 PM org.apache.jk.common.ChannelSocket
> processConnection
> SEVERE: Error, processing connection
> java.lang.IndexOutOfBoundsException
> at java.io.BufferedInputStream.read(BufferedInputStream.java:338)
> at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:628)
> at
> org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:585)
> at
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:693)
> at
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:898)
> at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> After doing some digging, I found that tomcat's AJP connector is to blame:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c1296129324.29340.1417510...@webmail.messagingengine.com%3E
>
>
>
> The post specifies Java code to post via http, but in my code, I'm not
> using a library. I'm just doing GET/POST/DELETE to solr via http: "
> http://127.0.0.1:8080/solr";
>
>
> How can I specify that solr use tomcat's http connector, not the AJP
> connector?

Solr cannot dictate which connector is used.  The exception you are
seeing is purely in tomcat code -- the request has not made it to Solr
at all.

You choose which connector to use when you set up Tomcat's configuration
and choose the port where you send your request -- send it to the port
specified in the http connector.  I believe that a default Tomcat config
normally uses 8080 for HTTP and 8109 for AJP.

Thanks,
Shawn



Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread O. Olson
Thank you Erick. I have no clue what you are referring to when you used to
word "this"?  Are you referring to my question in my original email/message? 


Erick Erickson wrote
> Have you seen this? I tried to make something end-to-end with assorted
> "gotchas" identified
> 
>  Best,
> Erick





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Specify HTTP instead of AJP on tomcat

2015-05-07 Thread Aki Balogh
Hello,

I'm seeing the following error while indexing:

May 06, 2015 10:52:32 PM org.apache.jk.common.MsgAjp processHeader
SEVERE: BAD packet signature 18245
May 06, 2015 10:52:32 PM org.apache.jk.common.ChannelSocket
processConnection
SEVERE: Error, processing connection
java.lang.IndexOutOfBoundsException
at java.io.BufferedInputStream.read(BufferedInputStream.java:338)
at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:628)
at
org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:585)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:693)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:898)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:745)



After doing some digging, I found that tomcat's AJP connector is to blame:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c1296129324.29340.1417510...@webmail.messagingengine.com%3E



The post specifies Java code to post via http, but in my code, I'm not
using a library. I'm just doing GET/POST/DELETE to solr via http: "
http://127.0.0.1:8080/solr";


How can I specify that solr use tomcat's http connector, not the AJP
connector?

Thanks,
Aki


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-07 Thread O. Olson
Thank you Rajesh for your persistence. I now got it to work. In my original
email/message, I mentioned that I use 'text_general' as defined in the
examples:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/schema.xml?view=markup
 
I'm sorry I did not mention this again later. 

Your definition of 'text_general' is a lot different from what's in the
examples. However, once I used it, I got this to work just as you said. 

Thank you,
O. O.


Rajesh Hazari wrote
> yes "textSuggest" is of type "text_general" with below definition
>   positionIncrementGap="100" sortMissingLast="true" omitNorms="true">
>  
> 
> 
> 
> 
> 
> 
>  protected="protwords.txt"/>
>  outputUnigrams="true"/>
>   
> 
>   
> 
>  mapping="mapping-FoldToASCII.txt"/>
>  
> 
> 
> 
> 
> 
>  protected="protwords.txt"/>
>  outputUnigrams="true"/>
>   
> 
> 
> 
> *Rajesh.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jack Krupansky
A leader is also a replica - SolrCloud is not a master/slave architecture.
Any replica can be elected to be the leader, but that is only temporary and
can change over time.

You can place multiple shards on a single node, but was that really your
intention?

Generally, number of nodes equals number of shards times the replication
factor. But then divided by shards per node if you do place more than one
shard per node.

-- Jack Krupansky

On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik  wrote:

> Hi,
>
> Is it possible to restrict number of documents per shard in Solr cloud?
>
> Lets say we have Solr cloud with 4 nodes, and on each node we have one
> leader and one replica. Like wise total we have 8 shards that includes
> replicas. Now I need to index my documents in such a way that each shard
> will have only 5 million documents. Total documents in Solr cloud should be
> 20 million documents.
>
>
> Thanks,
> Jilani
>


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jack Krupansky
Wait a minute, guys... aren't we in the 21st century, where disk is
ultra-cheap and ultra plentiful? So... what's the REAL problem here?
Seriously, when multi-terabyte drives are so common on servers and Solr
really doesn't work well with more than 100 to 250 million docs per server
anyway, which is way under needing terabytes, what could possibly be the
problem??!!

Or... is this really an SSD rather than a spinning disk issue? Possibly a
virtualization issue, where a single physical machine with only modest
physical SSD is virtualized into multiple virtual machines, but then each
virtual machine gets only a fairly tiny amount of SSD disk storage space?
Just guessing here

A little clarification is in order.

In any case, if you really only have such a limited amount of storage per
node, that probably simply means that you need more nodes.


-- Jack Krupansky

On Thu, May 7, 2015 at 9:51 AM, Erick Erickson 
wrote:

> bq: We will not be able to limit the documents per shard in Solr
> Cloud. As Solr will accept all the documents as long as space is there
> for it to index.
>
> True, end of story ;).
>
> How does Solr know it will run out of space? It hits an exception,
> there's really no "this doesn't look like it will fit so let's not
> index it". But that's not really a problem because you need at least
> as much free space on your disk as the index size to handle merges so
> you'll run into many, many, many other problems before you fill up
> your disk.
>
> The hashing function that's used to distribute the files across the
> shards has not had any reports of significant uneven distribution that
> I know of. So simply dividing the number of docs by number of shards
> and assuming that number (+/- a very small number, < 1%) of docs will
> get on each shard is usually good enough. If you see something
> different it would be good to know
>
> Best,
> Erick
>
> On Thu, May 7, 2015 at 12:45 AM, Jilani Shaik 
> wrote:
> > Hi Daniel,
> >
> > Thanks for the detailed explanation.
> >
> > My understanding is also similar to you that we should not provide limit
> > over the shard for number of documents that it can index. Usually it will
> > depend on shard routing provided by Solr and I am not expecting any
> change
> > to document routing process.
> >
> > My team needs that option if at all possible, Before saying "not possible
> > at Solr end to limit the documents per shard", I just want to get
> > confirmation or some details of this. So I dropped a question here to get
> > answers.
> >
> > You mentioned that "as long as it has sufficient space to do index"
> >  - How will Solr knows or estimate that "whether Solr has sufficient
> > space to index or not on particular shard or on entire cloud?"
> >
> > Conclusion of my understand:
> > We will not be able to limit the documents per shard in Solr Cloud. As
> Solr
> > will accept all the documents as long as space is there for it to index.
> >
> > Please suggest.
> >
> > Thanks,
> > Jilani
> >
> > On Thu, May 7, 2015 at 12:41 PM, Daniel Collins 
> > wrote:
> >
> >> Not sure I understand your problem.  If you have 20m documents, and 8
> >> shards, then each shard is (broadly speaking) only going to have 2.5m
> docs
> >> each, so I don't follow the 5m limit? That is with the default
> >> routing/hashing, obviously you can write your own hash algorithm or you
> can
> >> shard at your application level.
> >>
> >> In terms of limiting documents in a shard, I'm not sure what purpose
> that
> >> would serve.  If for arguments sake you only had 2 shards, and a limit
> of
> >> 5m doccs per shard, what happens when you hit that limit?  If you have
> >> indexed 10m docs, and now you try to index one more, what would you
> expect
> >> to happen, would the system just reject any documents, should it try to
> >> shard to shard 1 but see that is full, and then fail-over to shard2
> instead
> >> (that's not going to work as sharding needs to be reproducible and the
> >> document was intended for shard 1)?
> >>
> >> Solr's basic premise would be to index what you gave it, as long as it
> has
> >> sufficient space to do that.  If you want to limit your index to 20m
> docs,
> >> that is probably better done at the application layer (but I still don't
> >> really see why you would want to do that).
> >>
> >> On 7 May 2015 at 06:29, Jilani Shaik  wrote:
> >>
> >> > Hi,
> >> >
> >> > Is it possible to restrict number of documents per shard in Solr
> cloud?
> >> >
> >> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
> >> > leader and one replica. Like wise total we have 8 shards that includes
> >> > replicas. Now I need to index my documents in such a way that each
> shard
> >> > will have only 5 million documents. Total documents in Solr cloud
> should
> >> be
> >> > 20 million documents.
> >> >
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >>
>


Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-07 Thread Steven White
Thanks Steve and Yonik.  This now makes sense.  Updating the doc will be of
a big help.

With regards to deleting a copy-field, what I found is that if I have N
instances of the same copy-field, I have to issue N deletes to remove them
all.  This behavior matches with the add and need to be kept.

Steve

On Wed, May 6, 2015 at 8:10 PM, Steve Rowe  wrote:

> Hi Steve,
>
> It’s by design that you can copyField the same source/dest multiple times
> - according to Yonik (not sure where this was discussed), this capability
> has been used in the past to effectively boost terms in the source field.
>
> The API isn’t symmetric here though: I’m guessing deleting a mutiply
> specified copy field rule will delete all of them, but this isn’t tested,
> so I’m not sure.
>
> There is no replace-copy-field command because copy field rules don’t have
> dependencies (i.e., nothing else in the schema refers to copy field rules),
> unlike fields, dynamic fields and field types, so
> delete-copy-field/add-copy-field works as one would expect.
>
> For fields, dynamic fields and field types, a delete followed by an add is
> not the same as a replace, since (dynamic) fields could have dependent
> copyFields, and field types could have dependent (dynamic) fields.
> delete-* commands are designed to fail if there are any existing
> dependencies, while the replace-* commands will maintain the dependencies
> if they exist.
>
> Steve
>
> > On May 6, 2015, at 6:44 PM, Steven White  wrote:
> >
> > Hi Everyone,
> >
> > I am using the Schema API to add a new copy field per:
> >
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule
> >
> > Unlike the other "Add" APIs, this one will not fail if you add an
> existing
> > copy field object.  In fact, after when I call the API over and over, the
> > item will appear over and over in schema.xml file like so:
> >
> >  
> >  
> >  
> >  
> >
> > Is this the expected behaviour or a bug?  As a side question, is there
> any
> > harm in having multiple "copyField" like I ended up with?
> >
> > A final question, why there is no Replace a Copy Field?  Is this by
> design
> > for some limitation or was the API just never implemented?
> >
> > Thanks
> >
> > Steve
>
>


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Erick Erickson
bq: We will not be able to limit the documents per shard in Solr
Cloud. As Solr will accept all the documents as long as space is there
for it to index.

True, end of story ;).

How does Solr know it will run out of space? It hits an exception,
there's really no "this doesn't look like it will fit so let's not
index it". But that's not really a problem because you need at least
as much free space on your disk as the index size to handle merges so
you'll run into many, many, many other problems before you fill up
your disk.

The hashing function that's used to distribute the files across the
shards has not had any reports of significant uneven distribution that
I know of. So simply dividing the number of docs by number of shards
and assuming that number (+/- a very small number, < 1%) of docs will
get on each shard is usually good enough. If you see something
different it would be good to know

Best,
Erick

On Thu, May 7, 2015 at 12:45 AM, Jilani Shaik  wrote:
> Hi Daniel,
>
> Thanks for the detailed explanation.
>
> My understanding is also similar to you that we should not provide limit
> over the shard for number of documents that it can index. Usually it will
> depend on shard routing provided by Solr and I am not expecting any change
> to document routing process.
>
> My team needs that option if at all possible, Before saying "not possible
> at Solr end to limit the documents per shard", I just want to get
> confirmation or some details of this. So I dropped a question here to get
> answers.
>
> You mentioned that "as long as it has sufficient space to do index"
>  - How will Solr knows or estimate that "whether Solr has sufficient
> space to index or not on particular shard or on entire cloud?"
>
> Conclusion of my understand:
> We will not be able to limit the documents per shard in Solr Cloud. As Solr
> will accept all the documents as long as space is there for it to index.
>
> Please suggest.
>
> Thanks,
> Jilani
>
> On Thu, May 7, 2015 at 12:41 PM, Daniel Collins 
> wrote:
>
>> Not sure I understand your problem.  If you have 20m documents, and 8
>> shards, then each shard is (broadly speaking) only going to have 2.5m docs
>> each, so I don't follow the 5m limit? That is with the default
>> routing/hashing, obviously you can write your own hash algorithm or you can
>> shard at your application level.
>>
>> In terms of limiting documents in a shard, I'm not sure what purpose that
>> would serve.  If for arguments sake you only had 2 shards, and a limit of
>> 5m doccs per shard, what happens when you hit that limit?  If you have
>> indexed 10m docs, and now you try to index one more, what would you expect
>> to happen, would the system just reject any documents, should it try to
>> shard to shard 1 but see that is full, and then fail-over to shard2 instead
>> (that's not going to work as sharding needs to be reproducible and the
>> document was intended for shard 1)?
>>
>> Solr's basic premise would be to index what you gave it, as long as it has
>> sufficient space to do that.  If you want to limit your index to 20m docs,
>> that is probably better done at the application layer (but I still don't
>> really see why you would want to do that).
>>
>> On 7 May 2015 at 06:29, Jilani Shaik  wrote:
>>
>> > Hi,
>> >
>> > Is it possible to restrict number of documents per shard in Solr cloud?
>> >
>> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
>> > leader and one replica. Like wise total we have 8 shards that includes
>> > replicas. Now I need to index my documents in such a way that each shard
>> > will have only 5 million documents. Total documents in Solr cloud should
>> be
>> > 20 million documents.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>>


Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-07 Thread Steve Rowe

> On May 6, 2015, at 8:25 PM, Yonik Seeley  wrote:
> 
> On Wed, May 6, 2015 at 8:10 PM, Steve Rowe  wrote:
>> It’s by design that you can copyField the same source/dest multiple times - 
>> according to Yonik (not sure where this was discussed), this capability has 
>> been used in the past to effectively boost terms in the source field.
> 
> Yep, used to be relatively common.
> Perhaps the API could be cleaner though if we supported that by
> passing an optional "numTimes" or "numCopies"?  Seems like a sane
> delete / overwrite options would thus be easier?

+1

Re: solr.war built from solr 4.7.2 not working

2015-05-07 Thread Shawn Heisey
On 5/7/2015 3:43 AM, Rahul Singh wrote:
>   I have tried to deploy solr.war from building it from 4.7.2 but it is
> showing the below mentioned error. Has anyone faced the same? any lead
> would also be appreciated.
> 
> Error Message:
> 
> {
>   "responseHeader": {
> "status": 500,
> "QTime": 33
>   },
>   "error": {
> "msg": "parsing error",
> "trace":
> "org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> parsing error

Did you change the source code in any way before you compiled it?  You
haven't said what you're actually doing that resulted in this error, or
given any other details about your setup.  It's good that you've given
us the full response with the error, but additional details, like the
request that generated the error and any errors found in the Solr log,
are important.

Because the error comes from HttpSolrServer and is embedded in a Solr
response, I'm guessing this is a distributed request ... but I can't
tell if it's SolrCloud or "manual" sharding.

With no other information to go on, I do have some possible ideas:

You might have changed something fundamental in the source code that
makes the distributed request incompatible with the target core/server.

There might be mixed versions ... either multiple copies of jars on the
classpath from different versions of Solr, or a version with your code
changes trying to talk to another instance without your changes.

Are there error messages in the Solr log on the instance of Solr that
received the distributed request?

Thanks,
Shawn



Re: SolrCloud indexing

2015-05-07 Thread Shawn Heisey
On 5/7/2015 3:04 AM, Vincenzo D'Amore wrote:
> Thanks Erick. I'm not sure I got your answer.
> 
> I try to recap, when the raw document has to be indexed, it will be
> forwarded to shard leader. Shard leader indexes the document for that
> shard, and then forwards the indexed document to any replicas.
> 
> I want just be sure that when the raw document is forwarded from the leader
> to the replicas it will be indexed only one time on the shard leader. From
> what I understand replicas do not indexes, only the leader indexes.

The document is indexed by all replicas.  There is no way to forward the
indexed document, it can only forward the source document ... so each
replica must index it independently.

The old-style master-slave replication (which existed long before
SolrCloud) copies the finished Lucene segments, so only the master
actually does indexing.

SolrCloud doesn't have a master, only multiple replicas, one of which is
elected leader, and replication only comes into the picture if there's a
serious problem and Solr determines that it can't use the transaction
log to recover the index.

Thanks,
Shawn



Re: Proximity searching in percentage

2015-05-07 Thread Alessandro Benedetti
Hi !
Currently Solr builds FST to provide proper fuzzy search or spellcheck
suggestions based on the string distance .
The current default algorithm is the Levenstein distance ( that returns the
number of edit as distance metric).
In your case you should calculate client side, the edit you want to apply
to your search.
In your client code, should be not difficult to process the query and apply
the proper number of edit depending on the length.

Anyway the max edit for the levenstein default distance is fixed to 2 .

Cheers



2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo :

> Hi,
>
> Would like to check, how do we implement character proximity searching
> that's in terms of percentage with regards to the length of the word,
> instead of a fixed number of edit distance (characters)?
>
> For example, if we have a proximity of 20%, a word with 5 characters will
> have an edit distance of 1, and a word with 10 characters will
> automatically have an edit distance of 2.
>
> Will Solr be able to do that for us?
>
> Regards,
> Edwin
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


[ Clustering ] Request handler rows - Clustering rows

2015-05-07 Thread Alessandro Benedetti
Hi guys,
was thinking to the clustering and the integration of the clustering search
component in an existing request handler.

I am talking about Online clustering ( the clustering of search results).
Once you configure the search component with the engine definition,
clustering will happen taking in consideration the TOP rows results ( where
rows is the parameter of the request handler) .

I was wondering if it's possible to separate the rows for the request
handler and the rows of the clustering component.
Probably is not possible by default but it should be , with some
developments.
The use case to cover is the following :
" As a user I want to see the first N results. Because I am using the
clusters generated as additional facets, I would like to see the clustering
applied not to N documents but to M documents ( M > N) .
Ideally i would like to see the clustering applied to all the documents in
the results, applying it to the first 100, 200 can be a good compromise as
well. "

What do you think ?

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


solr.war built from solr 4.7.2 not working

2015-05-07 Thread Rahul Singh
Hi,
  I have tried to deploy solr.war from building it from 4.7.2 but it is
showing the below mentioned error. Has anyone faced the same? any lead
would also be appreciated.

Error Message:

{
  "responseHeader": {
"status": 500,
"QTime": 33
  },
  "error": {
"msg": "parsing error",
"trace":
"org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
parsing error
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: parsing error
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:45)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475)
... 9 more
Caused by: java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:172)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:477)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:359)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:125)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
... 10 more
",
"code": 500
  }
}


Thanks and Regards,


schema modification issue

2015-05-07 Thread User Zolr
Hi there,

I have come accross a problem that  when using managed schema in SolrCloud,
adding fields into schema would SOMETIMES end up prompting "Can't find
resource 'schema.xml' in classpath or '/configs/collectionName',
cwd=/export/solr/solr-5.1.0/server", there is of course no schema.xml in
configs, but 'schema.xml.bak' and 'managed-schema'

i use solrj to create a collection:

Path tempPath = getConfigPath();
client.uploadConfig(tempPath, name); //customized configs with
solrconfig.xml using ManagedIndexSchemaFactory
 if(numShards==0){
numShards = getNumNodes(client);
}
 Create request = new CollectionAdminRequest.Create();
request.setCollectionName(name);
request.setNumShards(numShards);
replicationFactor =
(replicationFactor==0?DEFAULT_REPLICA_FACTOR:replicationFactor);
request.setReplicationFactor(replicationFactor);
request.setMaxShardsPerNode(maxShardsPerNode==0?replicationFactor:maxShardsPerNode);
CollectionAdminResponse response = request.process(client);


and adding fields to schema, either by curl or by httpclient,  would
sometimes yield the following error, but the error can be fixed by
RELOADING the newly created collection once or several times:

INFO  - [{  "responseHeader":{"status":500,"QTime":5},
 "errors":["Error reading input String Can't find resource 'schema.xml' in
classpath or '/configs/collectionName',
cwd=/export/solr/solr-5.1.0/server"],  "error":{"msg":"Can't find
resource 'schema.xml' in classpath or '/configs/collectionName',
cwd=/export/solr/solr-5.1.0/server","trace":"java.io.IOException: Can't
find resource 'schema.xml' in classpath or '/configs/collectionName',
cwd=/export/solr/solr-5.1.0/server

at
org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:98)
at
org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:421)
at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:104)
at
org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94)
at
org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)\n","code":500}}]


Re: SolrCloud indexing

2015-05-07 Thread Vincenzo D'Amore
Thanks Erick. I'm not sure I got your answer.

I try to recap, when the raw document has to be indexed, it will be
forwarded to shard leader. Shard leader indexes the document for that
shard, and then forwards the indexed document to any replicas.

I want just be sure that when the raw document is forwarded from the leader
to the replicas it will be indexed only one time on the shard leader. From
what I understand replicas do not indexes, only the leader indexes.

Best regards,
Vincenzo


On Wed, May 6, 2015 at 3:07 AM, Erick Erickson 
wrote:

> bq: Does it mean that all the indexing is done by the leaders in one node?
>
> no. The raw document is forwarded from the leader to the replica and
> it's indexed on all the nodes. The leader has a little bit of extra
> work to do routing the docs, but that's it. Shouldn't be a problem
> with 3 shards.
>
> bq: If so, how do I distribute the indexing (the shard leaders) across
> nodes?
>
> You don't really need to bother I don't think, especially if you don't
> see significantly higher CPU utilization on the leader. If you
> absolutely MUST distribute leadership, see the Collections API and the
> REBALANCELEADERS and BALANCESHARDUNIQUE (Solr 5.1 only) but frankly I
> wouldn't worry about it unless and until you had demonstrated need.
>
> Best,
> Erick
>
> On Tue, May 5, 2015 at 6:28 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > I have 3 nodes and there are 3 shards but looking at solrcloud admin I
> see
> > that all the leaders are on the same node.
> >
> > If I understood well looking at  solr documentation
> > <
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> > :
> >
> >> When a document is sent to a machine for indexing, the system first
> >> determines if the machine is a replica or a leader.
> >> If the machine is a replica, the document is forwarded to the leader for
> >> processing.
> >> If the machine is a leader, SolrCloud determines which shard the
> document
> >> should go to, forwards the document the leader for that shard, indexes
> the
> >> document for this shard, and forwards the index notation to itself and
> any
> >> replicas.
> >
> >
> > So I have 3 nodes, with 3 shards and 2 replicas of each shard.
> >
> >
> http://picpaste.com/pics/Screen_Shot_2015-05-05_at_15.19.54-Xp8uztpt.1430832218.png
> >
> > Does it mean that all the indexing is done by the leaders in one node? If
> > so, how do I distribute the indexing (the shard leaders) across nodes?
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jilani Shaik
Hi Daniel,

Thanks for the detailed explanation.

My understanding is also similar to you that we should not provide limit
over the shard for number of documents that it can index. Usually it will
depend on shard routing provided by Solr and I am not expecting any change
to document routing process.

My team needs that option if at all possible, Before saying "not possible
at Solr end to limit the documents per shard", I just want to get
confirmation or some details of this. So I dropped a question here to get
answers.

You mentioned that "as long as it has sufficient space to do index"
 - How will Solr knows or estimate that "whether Solr has sufficient
space to index or not on particular shard or on entire cloud?"

Conclusion of my understand:
We will not be able to limit the documents per shard in Solr Cloud. As Solr
will accept all the documents as long as space is there for it to index.

Please suggest.

Thanks,
Jilani

On Thu, May 7, 2015 at 12:41 PM, Daniel Collins 
wrote:

> Not sure I understand your problem.  If you have 20m documents, and 8
> shards, then each shard is (broadly speaking) only going to have 2.5m docs
> each, so I don't follow the 5m limit? That is with the default
> routing/hashing, obviously you can write your own hash algorithm or you can
> shard at your application level.
>
> In terms of limiting documents in a shard, I'm not sure what purpose that
> would serve.  If for arguments sake you only had 2 shards, and a limit of
> 5m doccs per shard, what happens when you hit that limit?  If you have
> indexed 10m docs, and now you try to index one more, what would you expect
> to happen, would the system just reject any documents, should it try to
> shard to shard 1 but see that is full, and then fail-over to shard2 instead
> (that's not going to work as sharding needs to be reproducible and the
> document was intended for shard 1)?
>
> Solr's basic premise would be to index what you gave it, as long as it has
> sufficient space to do that.  If you want to limit your index to 20m docs,
> that is probably better done at the application layer (but I still don't
> really see why you would want to do that).
>
> On 7 May 2015 at 06:29, Jilani Shaik  wrote:
>
> > Hi,
> >
> > Is it possible to restrict number of documents per shard in Solr cloud?
> >
> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
> > leader and one replica. Like wise total we have 8 shards that includes
> > replicas. Now I need to index my documents in such a way that each shard
> > will have only 5 million documents. Total documents in Solr cloud should
> be
> > 20 million documents.
> >
> >
> > Thanks,
> > Jilani
> >
>


Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Daniel Collins
Not sure I understand your problem.  If you have 20m documents, and 8
shards, then each shard is (broadly speaking) only going to have 2.5m docs
each, so I don't follow the 5m limit? That is with the default
routing/hashing, obviously you can write your own hash algorithm or you can
shard at your application level.

In terms of limiting documents in a shard, I'm not sure what purpose that
would serve.  If for arguments sake you only had 2 shards, and a limit of
5m doccs per shard, what happens when you hit that limit?  If you have
indexed 10m docs, and now you try to index one more, what would you expect
to happen, would the system just reject any documents, should it try to
shard to shard 1 but see that is full, and then fail-over to shard2 instead
(that's not going to work as sharding needs to be reproducible and the
document was intended for shard 1)?

Solr's basic premise would be to index what you gave it, as long as it has
sufficient space to do that.  If you want to limit your index to 20m docs,
that is probably better done at the application layer (but I still don't
really see why you would want to do that).

On 7 May 2015 at 06:29, Jilani Shaik  wrote:

> Hi,
>
> Is it possible to restrict number of documents per shard in Solr cloud?
>
> Lets say we have Solr cloud with 4 nodes, and on each node we have one
> leader and one replica. Like wise total we have 8 shards that includes
> replicas. Now I need to index my documents in such a way that each shard
> will have only 5 million documents. Total documents in Solr cloud should be
> 20 million documents.
>
>
> Thanks,
> Jilani
>