date:20140708

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Shalin Shekhar Mangar

Hi Walter,

I wonder why you think SolrCloud isn't necessary if you're indexing once
per week. Isn't the automatic failover and auto-sharding still useful? One
can also do custom sharding with SolrCloud if necessary.


On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood 
wrote:

> More memory or faster disks will make a much bigger improvement than a
> forced merge.
>
> What are you measuring? If it is average query time, that is not a good
> measure. Look at 90th or 95th percentile. Test with queries from logs.
>
> No user can see a 10% or 20% difference. If your managers are watching
> that, they are watching the wrong thing.
>
> If you are indexing once per week, you don't really need the complexity of
> Solr Cloud. You can do manual sharding.
>
> wunder
>
> On Jul 8, 2014, at 10:55 PM, Modassar Ather 
> wrote:
>
> > Our index has almost 100M documents running on SolrCloud of 3 shards and
> > each shard has an index size of about 700GB (for the record, we are not
> > using stored fields - our documents are pretty large). We perform a full
> > indexing every weekend and during the week there are no updates made to
> the
> > index. Most of the queries that we run are pretty complex with hundreds
> of
> > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> > and take many minutes to execute. A difference of 10-20% is also a big
> > advantage for us.
> >
> > We have been optimizing the index after indexing for years and it has
> > worked well for us. Every once in a while, we upgrade Solr to the latest
> > version and try without optimizing so that we can save the many hours it
> > take to optimize such a huge index, but it does not work well.
> >
> > Kindly provide your suggestion.
> >
> > Thanks,
> > Modassar
> >
> >
> > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood  >
> > wrote:
> >
> >> I seriously doubt that you are required to force merge.
> >>
> >> How much improvement? And is the big performance cost also OK?
> >>
> >> I have worked on search engines that do automatic merges and offer
> forced
> >> merges for over fifteen years. For all that time, forced merges have
> >> usually caused problems.
> >>
> >> Stop doing forced merges.
> >>
> >> wunder
> >>
> >> On Jul 8, 2014, at 10:09 PM, Modassar Ather 
> >> wrote:
> >>
> >>> Thanks Walter for your inputs.
> >>>
> >>> Our use case and performance benchmark requires us to invoke optimize.
> >>>
> >>> Here we see a chance of improvement in performance of optimize() if
> >> invoked
> >>> in parallel.
> >>> I found that if* distrib=false *is used, the optimization will happen
> in
> >>> parallel.
> >>>
> >>> But I could not find a way to set it using
> >> HttpSolrServer/CloudSolrServer.
> >>> Also with the parameter setting as given in my mail above does not
> seems
> >> to
> >>> work.
> >>>
> >>> Please let me know in what ways I can achieve the parallel optimize on
> >>> SolrCloud.
> >>>
> >>> Thanks,
> >>> Modassar
> >>>
> >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <
> wun...@wunderwood.org>
> >>> wrote:
> >>>
>  You probably do not need to force merge (mistakenly called "optimize")
>  your index.
> 
>  Solr does automatic merges, which work just fine.
> 
>  There are only a few situations where a forced merge is even a good
> >> idea.
>  The most common one is a replicated (non-cloud) setup with a full
> >> reindex
>  every night.
> 
>  If you need Solr Cloud, I cannot think of a situation where you would
> >> want
>  a forced merge.
> 
>  wunder
> 
>  On Jul 8, 2014, at 2:01 AM, Modassar Ather 
> >> wrote:
> 
> > Hi,
> >
> > Need to optimize index created using CloudSolrServer APIs under
> >> SolrCloud
> > setup of 3 instances on separate machines. Currently it optimizes
> > sequentially if I invoke cloudSolrServer.optimize().
> >
> > To make it parallel I tried making three separate HttpSolrServer
>  instances
> > and invoked httpSolrServer.opimize() on them parallely but still it
> >> seems
> > to be doing optimization sequentially.
> >
> > I tried invoking optimize directly using HttpPost with following url
> >> and
> > parameters but still it seems to be sequential.
> > *URL* : http://host:port/solr/collection/update
> >
> > *Parameters*:
> > params.add(new BasicNameValuePair("optimize", "true"));
> > params.add(new BasicNameValuePair("maxSegments", "1"));
> > params.add(new BasicNameValuePair("waitFlush", "true"));
> > params.add(new BasicNameValuePair("distrib", "false"));
> >
> > Kindly provide your suggestion and help.
> >
> > Regards,
> > Modassar
> 
> 
> 
> 
> 
> >>
> >> --
> >> Walter Underwood
> >> wun...@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood

More memory or faster disks will make a much bigger improvement than a forced 
merge.

What are you measuring? If it is average query time, that is not a good 
measure. Look at 90th or 95th percentile. Test with queries from logs.

No user can see a 10% or 20% difference. If your managers are watching that, 
they are watching the wrong thing.

If you are indexing once per week, you don't really need the complexity of Solr 
Cloud. You can do manual sharding.

wunder

On Jul 8, 2014, at 10:55 PM, Modassar Ather  wrote:

> Our index has almost 100M documents running on SolrCloud of 3 shards and
> each shard has an index size of about 700GB (for the record, we are not
> using stored fields - our documents are pretty large). We perform a full
> indexing every weekend and during the week there are no updates made to the
> index. Most of the queries that we run are pretty complex with hundreds of
> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> and take many minutes to execute. A difference of 10-20% is also a big
> advantage for us.
> 
> We have been optimizing the index after indexing for years and it has
> worked well for us. Every once in a while, we upgrade Solr to the latest
> version and try without optimizing so that we can save the many hours it
> take to optimize such a huge index, but it does not work well.
> 
> Kindly provide your suggestion.
> 
> Thanks,
> Modassar
> 
> 
> On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood 
> wrote:
> 
>> I seriously doubt that you are required to force merge.
>> 
>> How much improvement? And is the big performance cost also OK?
>> 
>> I have worked on search engines that do automatic merges and offer forced
>> merges for over fifteen years. For all that time, forced merges have
>> usually caused problems.
>> 
>> Stop doing forced merges.
>> 
>> wunder
>> 
>> On Jul 8, 2014, at 10:09 PM, Modassar Ather 
>> wrote:
>> 
>>> Thanks Walter for your inputs.
>>> 
>>> Our use case and performance benchmark requires us to invoke optimize.
>>> 
>>> Here we see a chance of improvement in performance of optimize() if
>> invoked
>>> in parallel.
>>> I found that if* distrib=false *is used, the optimization will happen in
>>> parallel.
>>> 
>>> But I could not find a way to set it using
>> HttpSolrServer/CloudSolrServer.
>>> Also with the parameter setting as given in my mail above does not seems
>> to
>>> work.
>>> 
>>> Please let me know in what ways I can achieve the parallel optimize on
>>> SolrCloud.
>>> 
>>> Thanks,
>>> Modassar
>>> 
>>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
>>> wrote:
>>> 
 You probably do not need to force merge (mistakenly called "optimize")
 your index.
 
 Solr does automatic merges, which work just fine.
 
 There are only a few situations where a forced merge is even a good
>> idea.
 The most common one is a replicated (non-cloud) setup with a full
>> reindex
 every night.
 
 If you need Solr Cloud, I cannot think of a situation where you would
>> want
 a forced merge.
 
 wunder
 
 On Jul 8, 2014, at 2:01 AM, Modassar Ather 
>> wrote:
 
> Hi,
> 
> Need to optimize index created using CloudSolrServer APIs under
>> SolrCloud
> setup of 3 instances on separate machines. Currently it optimizes
> sequentially if I invoke cloudSolrServer.optimize().
> 
> To make it parallel I tried making three separate HttpSolrServer
 instances
> and invoked httpSolrServer.opimize() on them parallely but still it
>> seems
> to be doing optimization sequentially.
> 
> I tried invoking optimize directly using HttpPost with following url
>> and
> parameters but still it seems to be sequential.
> *URL* : http://host:port/solr/collection/update
> 
> *Parameters*:
> params.add(new BasicNameValuePair("optimize", "true"));
> params.add(new BasicNameValuePair("maxSegments", "1"));
> params.add(new BasicNameValuePair("waitFlush", "true"));
> params.add(new BasicNameValuePair("distrib", "false"));
> 
> Kindly provide your suggestion and help.
> 
> Regards,
> Modassar
 
 
 
 
 
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org

Re: External File Field eating memory

2014-07-08 Thread Kamal Kishore Aggarwal

Hi All,

It was found that external file, which was getting replicated after every
10 minutes was reloading the core as well. This was increasing the query
time.

Thanks
Kamal Kishore



On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal <
kkroyal@gmail.com> wrote:

> With the above replication configuration, the eff file is getting
> replicated at core/conf/data/external_eff_views (new dir data is being
> created in conf dir) location, but it is not getting replicated at 
> core/data/external_eff_views
> on slave.
>
> Please help.
>
>
> On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
>> Thanks for your guidance Alexandre Rafalovitch.
>>
>> I am looking into this seriously.
>>
>> Another question is that I facing error in replication of eff file
>>
>> This is master replication configuration:
>>
>> core/conf/solrconfig.xml
>>
>> 
>>> 
>>> commit
>>> startup
>>> ../data/external_eff_views
>>> 
>>> 
>>
>>
>> The eff file is present at core/data/external_eff_views location.
>>
>>
>> On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> This might be related:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-3514
>>>
>>>
>>> On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal <
>>> kkroyal@gmail.com> wrote:
>>>
>>> > Hi Team,
>>> >
>>> > I have recently implemented EFF in solr. There are about 1.5
>>> lacs(unsorted)
>>> > values in the external file. After this implementation, the server has
>>> > become slow. The solr query time has also increased.
>>> >
>>> > Can anybody confirm me if these issues are because of this
>>> implementation.
>>> > Is that memory does EFF eats up?
>>> >
>>> > Regards
>>> > Kamal Kishore
>>> >
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Our index has almost 100M documents running on SolrCloud of 3 shards and
each shard has an index size of about 700GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but it does not work well.

Kindly provide your suggestion.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood 
wrote:

> I seriously doubt that you are required to force merge.
>
> How much improvement? And is the big performance cost also OK?
>
> I have worked on search engines that do automatic merges and offer forced
> merges for over fifteen years. For all that time, forced merges have
> usually caused problems.
>
> Stop doing forced merges.
>
> wunder
>
> On Jul 8, 2014, at 10:09 PM, Modassar Ather 
> wrote:
>
> > Thanks Walter for your inputs.
> >
> > Our use case and performance benchmark requires us to invoke optimize.
> >
> > Here we see a chance of improvement in performance of optimize() if
> invoked
> > in parallel.
> > I found that if* distrib=false *is used, the optimization will happen in
> > parallel.
> >
> > But I could not find a way to set it using
> HttpSolrServer/CloudSolrServer.
> > Also with the parameter setting as given in my mail above does not seems
> to
> > work.
> >
> > Please let me know in what ways I can achieve the parallel optimize on
> > SolrCloud.
> >
> > Thanks,
> > Modassar
> >
> > On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
> > wrote:
> >
> >> You probably do not need to force merge (mistakenly called "optimize")
> >> your index.
> >>
> >> Solr does automatic merges, which work just fine.
> >>
> >> There are only a few situations where a forced merge is even a good
> idea.
> >> The most common one is a replicated (non-cloud) setup with a full
> reindex
> >> every night.
> >>
> >> If you need Solr Cloud, I cannot think of a situation where you would
> want
> >> a forced merge.
> >>
> >> wunder
> >>
> >> On Jul 8, 2014, at 2:01 AM, Modassar Ather 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Need to optimize index created using CloudSolrServer APIs under
> SolrCloud
> >>> setup of 3 instances on separate machines. Currently it optimizes
> >>> sequentially if I invoke cloudSolrServer.optimize().
> >>>
> >>> To make it parallel I tried making three separate HttpSolrServer
> >> instances
> >>> and invoked httpSolrServer.opimize() on them parallely but still it
> seems
> >>> to be doing optimization sequentially.
> >>>
> >>> I tried invoking optimize directly using HttpPost with following url
> and
> >>> parameters but still it seems to be sequential.
> >>> *URL* : http://host:port/solr/collection/update
> >>>
> >>> *Parameters*:
> >>> params.add(new BasicNameValuePair("optimize", "true"));
> >>> params.add(new BasicNameValuePair("maxSegments", "1"));
> >>> params.add(new BasicNameValuePair("waitFlush", "true"));
> >>> params.add(new BasicNameValuePair("distrib", "false"));
> >>>
> >>> Kindly provide your suggestion and help.
> >>>
> >>> Regards,
> >>> Modassar
> >>
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood

I seriously doubt that you are required to force merge.

How much improvement? And is the big performance cost also OK?

I have worked on search engines that do automatic merges and offer forced 
merges for over fifteen years. For all that time, forced merges have usually 
caused problems.

Stop doing forced merges.

wunder

On Jul 8, 2014, at 10:09 PM, Modassar Ather  wrote:

> Thanks Walter for your inputs.
> 
> Our use case and performance benchmark requires us to invoke optimize.
> 
> Here we see a chance of improvement in performance of optimize() if invoked
> in parallel.
> I found that if* distrib=false *is used, the optimization will happen in
> parallel.
> 
> But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
> Also with the parameter setting as given in my mail above does not seems to
> work.
> 
> Please let me know in what ways I can achieve the parallel optimize on
> SolrCloud.
> 
> Thanks,
> Modassar
> 
> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
> wrote:
> 
>> You probably do not need to force merge (mistakenly called "optimize")
>> your index.
>> 
>> Solr does automatic merges, which work just fine.
>> 
>> There are only a few situations where a forced merge is even a good idea.
>> The most common one is a replicated (non-cloud) setup with a full reindex
>> every night.
>> 
>> If you need Solr Cloud, I cannot think of a situation where you would want
>> a forced merge.
>> 
>> wunder
>> 
>> On Jul 8, 2014, at 2:01 AM, Modassar Ather  wrote:
>> 
>>> Hi,
>>> 
>>> Need to optimize index created using CloudSolrServer APIs under SolrCloud
>>> setup of 3 instances on separate machines. Currently it optimizes
>>> sequentially if I invoke cloudSolrServer.optimize().
>>> 
>>> To make it parallel I tried making three separate HttpSolrServer
>> instances
>>> and invoked httpSolrServer.opimize() on them parallely but still it seems
>>> to be doing optimization sequentially.
>>> 
>>> I tried invoking optimize directly using HttpPost with following url and
>>> parameters but still it seems to be sequential.
>>> *URL* : http://host:port/solr/collection/update
>>> 
>>> *Parameters*:
>>> params.add(new BasicNameValuePair("optimize", "true"));
>>> params.add(new BasicNameValuePair("maxSegments", "1"));
>>> params.add(new BasicNameValuePair("waitFlush", "true"));
>>> params.add(new BasicNameValuePair("distrib", "false"));
>>> 
>>> Kindly provide your suggestion and help.
>>> 
>>> Regards,
>>> Modassar
>> 
>> 
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org

Re: Add a new replica to SolrCloud

2014-07-08 Thread Himanshu Mehrotra

Yes, there is a way.

One node on which replica needs to be created hit

curl '
http://localhost:8983/solr/admin/cores?action=CREATE&name=&collection=&shard=<

shardid>'

For example

curl '
http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&collection=collection1&shard=shard2
'


see http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin
for details.


Thanks,

Himanshu



On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta  wrote:

> Hi,
>
> I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
> servers with number of shards as 2, replication factor as 2 and mas shards
> per node as 4.
>
> Now, I want to add another server to the SolrCloud as a replica. I can see
> Collection API to add a new replica but that was added in Solr 4.8. Is
> there some way to add a new replica in Solr 4.7.2?
>
> --
> Thanks
> Varun Gupta
>

Re: Add a new replica to SolrCloud

2014-07-08 Thread Shalin Shekhar Mangar

Yes, you can just call a Core Admin CREATE on the new node with the
collection name and optionally the shard name.


On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta  wrote:

> Hi,
>
> I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
> servers with number of shards as 2, replication factor as 2 and mas shards
> per node as 4.
>
> Now, I want to add another server to the SolrCloud as a replica. I can see
> Collection API to add a new replica but that was added in Solr 4.8. Is
> there some way to add a new replica in Solr 4.7.2?
>
> --
> Thanks
> Varun Gupta
>



-- 
Regards,
Shalin Shekhar Mangar.

Planning ahead for Solr Cloud and Scaling

2014-07-08 Thread Zane Rockenbaugh

I'm working on a product hosted with AWS that uses Elastic Beanstalk
auto-scaling to good effect and we are trying to set up similar (more or
less) runtime scaling support with Solr. I think I understand how to set
this up, and wanted to check I was on the right track.

We currently run 3 cores on a single host / Solr server / shard. This is
just fine for now, and we have overhead for the near future. However, I
need to have a plan, and then test, for a higher capacity future.

1) I gather that if I set up SolrCloud, and then later load increases, I
can spin up a second host / Solr server, create a new shard, and then split
the first shard:

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

And doing this, we no longer have to commit to shards out of the gate.

2) I'm not clear whether there's a big advantage splitting up the cores or
not. Two of the three cores will have about the same number of documents,
though only one contains large amounts of text. The third core is much
smaller in both bytes and documents (2 orders of magnitude).

3) We are also looking at moving multi-lingual. The current plan is to
store the localized text in fields within the same core. The languages will
be added over time. We can update the schema (as each will be optional).
This seems easier than adding a core for each language. Is there a downside?

Thanks for any pointers.

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Thanks Walter for your inputs.

Our use case and performance benchmark requires us to invoke optimize.

Here we see a chance of improvement in performance of optimize() if invoked
in parallel.
I found that if* distrib=false *is used, the optimization will happen in
parallel.

But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
Also with the parameter setting as given in my mail above does not seems to
work.

Please let me know in what ways I can achieve the parallel optimize on
SolrCloud.

Thanks,
Modassar



On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
wrote:

> You probably do not need to force merge (mistakenly called "optimize")
> your index.
>
> Solr does automatic merges, which work just fine.
>
> There are only a few situations where a forced merge is even a good idea.
> The most common one is a replicated (non-cloud) setup with a full reindex
> every night.
>
> If you need Solr Cloud, I cannot think of a situation where you would want
> a forced merge.
>
> wunder
>
> On Jul 8, 2014, at 2:01 AM, Modassar Ather  wrote:
>
> > Hi,
> >
> > Need to optimize index created using CloudSolrServer APIs under SolrCloud
> > setup of 3 instances on separate machines. Currently it optimizes
> > sequentially if I invoke cloudSolrServer.optimize().
> >
> > To make it parallel I tried making three separate HttpSolrServer
> instances
> > and invoked httpSolrServer.opimize() on them parallely but still it seems
> > to be doing optimization sequentially.
> >
> > I tried invoking optimize directly using HttpPost with following url and
> > parameters but still it seems to be sequential.
> > *URL* : http://host:port/solr/collection/update
> >
> > *Parameters*:
> > params.add(new BasicNameValuePair("optimize", "true"));
> > params.add(new BasicNameValuePair("maxSegments", "1"));
> > params.add(new BasicNameValuePair("waitFlush", "true"));
> > params.add(new BasicNameValuePair("distrib", "false"));
> >
> > Kindly provide your suggestion and help.
> >
> > Regards,
> > Modassar
>
>
>
>
>

Synchronising two masters

2014-07-08 Thread Prasi S

Hi ,
Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to
any one of the Masters through a load balancer and replicate the data.

Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i
have a failover master, M2 and that would be indexing the data. The problem
is, once the Master1 comes up, how to synchornize M1 and M2? SolrCloud
would the option rather that going with this setup. But, currently we want
it to be implemented in Master-Slave mode.

Any suggestions?
Thanks,
Prasi

Add a new replica to SolrCloud

2014-07-08 Thread Varun Gupta

Hi,

I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
servers with number of shards as 2, replication factor as 2 and mas shards
per node as 4.

Now, I want to add another server to the SolrCloud as a replica. I can see
Collection API to add a new replica but that was added in Solr 4.8. Is
there some way to add a new replica in Solr 4.7.2?

--
Thanks
Varun Gupta

Re: fix wiki error

2014-07-08 Thread Alexandre Rafalovitch

Why do you think so?

As of Solr 4, the CSV and JSON handlers have been unified in the
general update handler and the /update/json is there for legacy
reason.

The example should work. If it is not for you, it might be a different reason.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Wed, Jul 9, 2014 at 9:56 AM, Susmit Shukla  wrote:
> The url for solr atomic update documentation should contain json in the end.
> Here is the page -
> https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example
>
> curl http://localhost:8983/solr/update/*json* -H 
> 'Content-type:application/json'

fix wiki error

2014-07-08 Thread Susmit Shukla

The url for solr atomic update documentation should contain json in the end.
Here is the page -
https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'

Re: Solr atomic updates question

2014-07-08 Thread Bill Au

I see what you mean now.  Thanks for the example.  It makes things very
clear.

I have been thinking about the explanation in the original response more.
 According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add.  But both the Solr reference doc
(
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
says that:

"The first is *atomic updates*. This approach allows changing only one or
more fields of a document without having to re-index the entire document."

But since Solr is doing a delete by id followed by a add, so "without
having to re-index the entire document" apply to the client side only?  On
the server side the add means that the entire document is re-indexed, right?

Bill


On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay  wrote:

> Take a look at this update XML:
>
> 
>   
> 05991
> Steve McKay
> Walla Walla
> Python
>   
> 
>
> Let's say employeeId is the key. If there's a fourth field, salary, on the
> existing doc, should it be deleted or retained? With this update it will
> obviously be deleted:
>
> 
>   
> 05991
> Steve McKay
>   
> 
>
> With this XML it will be retained:
>
> 
>   
> 05991
> Walla Walla
> Python
>   
> 
>
> I'm not willing to guess what will happen in the case where non-atomic and
> atomic updates are present on the same add because I haven't looked at that
> code since 4.0, but I think I could make a case for retaining salary or for
> discarding it. That by itself reeks--and it's also not well documented.
> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
> time.
>
> Steve
>
> On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:
>
> > Thanks for that under-the-cover explanation.
> >
> > I am not sure what you mean by "mix atomic updates with regular field
> > values".  Can you give an example?
> >
> > Thanks.
> >
> > Bill
> >
> >
> > On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
> >
> >> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> >> fetched doc, then reindex. Whether you use atomic updates or send the
> >> entire doc to Solr, it has to deleteById then add. The perf difference
> >> between the atomic updates and "normal" updates is likely minimal.
> >>
> >> Atomic updates are for when you have changes and want to apply them to a
> >> document without affecting the other fields. A regular add will replace
> an
> >> existing document completely. AFAIK Solr will let you mix atomic updates
> >> with regular field values, but I don't think it's a good idea.
> >>
> >> Steve
> >>
> >> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
> >>
> >>> Solr atomic update allows for changing only one or more fields of a
> >>> document without having to re-index the entire document.  But what
> about
> >>> the case where I am sending in the entire document?  In that case the
> >> whole
> >>> document will be re-indexed anyway, right?  So I assume that there will
> >> be
> >>> no saving.  I am actually thinking that there will be a performance
> >> penalty
> >>> since atomic update requires Solr to first retrieve all the fields
> first
> >>> before updating.
> >>>
> >>> Bill
> >>
> >>
>
>

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay

Take a look at this update XML:


  
05991
Steve McKay
Walla Walla
Python
  


Let's say employeeId is the key. If there's a fourth field, salary, on the 
existing doc, should it be deleted or retained? With this update it will 
obviously be deleted:


  
05991
Steve McKay
  


With this XML it will be retained:


  
05991
Walla Walla
Python
  


I'm not willing to guess what will happen in the case where non-atomic and 
atomic updates are present on the same add because I haven't looked at that 
code since 4.0, but I think I could make a case for retaining salary or for 
discarding it. That by itself reeks--and it's also not well documented. Relying 
on iffy, poorly-documented behavior is asking for pain at upgrade time.

Steve

On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:

> Thanks for that under-the-cover explanation.
> 
> I am not sure what you mean by "mix atomic updates with regular field
> values".  Can you give an example?
> 
> Thanks.
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
> 
>> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
>> fetched doc, then reindex. Whether you use atomic updates or send the
>> entire doc to Solr, it has to deleteById then add. The perf difference
>> between the atomic updates and "normal" updates is likely minimal.
>> 
>> Atomic updates are for when you have changes and want to apply them to a
>> document without affecting the other fields. A regular add will replace an
>> existing document completely. AFAIK Solr will let you mix atomic updates
>> with regular field values, but I don't think it's a good idea.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
>> 
>>> Solr atomic update allows for changing only one or more fields of a
>>> document without having to re-index the entire document.  But what about
>>> the case where I am sending in the entire document?  In that case the
>> whole
>>> document will be re-indexed anyway, right?  So I assume that there will
>> be
>>> no saving.  I am actually thinking that there will be a performance
>> penalty
>>> since atomic update requires Solr to first retrieve all the fields first
>>> before updating.
>>> 
>>> Bill
>> 
>>

Re: Solr atomic updates question

2014-07-08 Thread Bill Au

Thanks for that under-the-cover explanation.

I am not sure what you mean by "mix atomic updates with regular field
values".  Can you give an example?

Thanks.

Bill


On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:

> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> fetched doc, then reindex. Whether you use atomic updates or send the
> entire doc to Solr, it has to deleteById then add. The perf difference
> between the atomic updates and "normal" updates is likely minimal.
>
> Atomic updates are for when you have changes and want to apply them to a
> document without affecting the other fields. A regular add will replace an
> existing document completely. AFAIK Solr will let you mix atomic updates
> with regular field values, but I don't think it's a good idea.
>
> Steve
>
> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
>
> > Solr atomic update allows for changing only one or more fields of a
> > document without having to re-index the entire document.  But what about
> > the case where I am sending in the entire document?  In that case the
> whole
> > document will be re-indexed anyway, right?  So I assume that there will
> be
> > no saving.  I am actually thinking that there will be a performance
> penalty
> > since atomic update requires Solr to first retrieve all the fields first
> > before updating.
> >
> > Bill
>
>

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay

Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched 
doc, then reindex. Whether you use atomic updates or send the entire doc to 
Solr, it has to deleteById then add. The perf difference between the atomic 
updates and "normal" updates is likely minimal.

Atomic updates are for when you have changes and want to apply them to a 
document without affecting the other fields. A regular add will replace an 
existing document completely. AFAIK Solr will let you mix atomic updates with 
regular field values, but I don't think it's a good idea.

Steve

On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:

> Solr atomic update allows for changing only one or more fields of a
> document without having to re-index the entire document.  But what about
> the case where I am sending in the entire document?  In that case the whole
> document will be re-indexed anyway, right?  So I assume that there will be
> no saving.  I am actually thinking that there will be a performance penalty
> since atomic update requires Solr to first retrieve all the fields first
> before updating.
> 
> Bill

Re: What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton

(Sorry - my mail was sent half ready)

hashes is an array of hash values generated some-how from the image.

So my question is what is the query being done in this part ?
I tried to reconstruct it by my own, by constructing select query with the
hash values seperated by OR but the results were different.
Any one can tell me why ?

This where the source code is : http://code.google.com/p/lire/



On Wed, Jul 9, 2014 at 1:29 AM, Yossi Biton  wrote:

> Hello there,
>
> I'm using a project named LIRE for image retrieval based on sole platform.
> There is part of the code which i can't understand, so maybe you could
> help me.
>
> The project implements request handler named lireq :
> public class LireRequestHandler extends RequestHandlerBase
>
> The search method in this handler is computed from lucene search +
> reranking.
> The first part goes like this :
> public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
> throws Exception {
> ...
> BooleanQuery query = new BooleanQuery();
> for (int i = 0; i < numHashes; i++) {
> query.add(new BooleanClause(new TermQuery(new Term(paramField,
> Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
> }
>
> SolrIndexSearcher searcher = req.getSearcher()
> TopDocs docs = searcher.search(query, candidateResultNumber);
>



-- 

יוסי

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Steve McKay

Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts 
behaving strangely in a socket-related way. Knowing exactly what's happening at 
the transport level is worth a month of guessing and poking.

On Jul 8, 2014, at 3:53 AM, Harald Kirsch  wrote:

> Hi all,
> 
> This is what happens when I run a regular wget query to log the current 
> number of documents indexed:
> 
> 2014-07-08:07:23:28 QTime=20 numFound="5720168"
> 2014-07-08:07:24:28 QTime=12 numFound="5721126"
> 2014-07-08:07:25:28 QTime=19 numFound="5721126"
> 2014-07-08:07:27:18 QTime=50071 numFound="5721126"
> 2014-07-08:07:29:08 QTime=50058 numFound="5724494"
> 2014-07-08:07:30:58 QTime=50033 numFound="5730710"
> 2014-07-08:07:31:58 QTime=13 numFound="5730710"
> 2014-07-08:07:33:48 QTime=50065 numFound="5734069"
> 2014-07-08:07:34:48 QTime=16 numFound="5737742"
> 2014-07-08:07:36:38 QTime=50037 numFound="5737742"
> 2014-07-08:07:37:38 QTime=12 numFound="5738190"
> 2014-07-08:07:38:38 QTime=23 numFound="5741208"
> 2014-07-08:07:40:29 QTime=50034 numFound="5742067"
> 2014-07-08:07:41:29 QTime=12 numFound="5742067"
> 2014-07-08:07:42:29 QTime=17 numFound="5742067"
> 2014-07-08:07:43:29 QTime=20 numFound="5745497"
> 2014-07-08:07:44:29 QTime=13 numFound="5745981"
> 2014-07-08:07:45:29 QTime=23 numFound="5746420"
> 
> As you can see, the QTime is just over 50 seconds at irregular intervals.
> 
> This happens independent of whether I am indexing documents with around 20 
> dps or not. First I thought about a dependence on the auto-commit of 5 
> minutes, but the the 50 seconds hits are too irregular.
> 
> Furthermore, and this is *really strange*: when hooking strace on the solr 
> process, the 50 seconds QTimes disappear completely and consistently --- a 
> real Heisenbug.
> 
> Nevertheless, strace shows that there is a socket timeout of 50 seconds 
> defined in calls like this:
> 
> [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) 
> = 1 ([{fd=96, revents=POLLIN}]) <0.40>
> 
> where the fd=96 is the result of
> 
> [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, 
> sin_port=htons(57236), sin_addr=inet_addr("ip address of local host")}, [16]) 
> = 96 <0.54>
> 
> where again fd=122 is the TCP port on which solr was started.
> 
> My hunch is that this is communication between the cores of solr.
> 
> I tried to search the internet for such a strange connection between socket 
> timeouts and strace, but could not find anything (the stackoverflow entry 
> from yesterday is my own :-(
> 
> 
> This smells a bit like a race condition/deadlock kind of thing which is 
> broken up by timing differences introduced by stracing the process.
> 
> Any hints appreciated.
> 
> For completeness, here is my setup:
> - solr-4.8.1,
> - cloud version running
> - 10 shards on 10 cores in one instance
> - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2
> - hosted on a vmware, 4 CPU cores, 16 GB RAM
> - single digit million docs indexed, exact number does not matter
> - zero query load
> 
> 
> Harald.

What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton

Hello there,

I'm using a project named LIRE for image retrieval based on sole platform.
There is part of the code which i can't understand, so maybe you could help
me.

The project implements request handler named lireq :
public class LireRequestHandler extends RequestHandlerBase

The search method in this handler is computed from lucene search +
reranking.
The first part goes like this :
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
...
BooleanQuery query = new BooleanQuery();
for (int i = 0; i < numHashes; i++) {
query.add(new BooleanClause(new TermQuery(new Term(paramField,
Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
}

SolrIndexSearcher searcher = req.getSearcher()
TopDocs docs = searcher.search(query, candidateResultNumber);

Solr atomic updates question

2014-07-08 Thread Bill Au

Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document.  But what about
the case where I am sending in the entire document?  In that case the whole
document will be re-indexed anyway, right?  So I assume that there will be
no saving.  I am actually thinking that there will be a performance penalty
since atomic update requires Solr to first retrieve all the fields first
before updating.

Bill

RE: [Solr Schema API] SolrJ Access

2014-07-08 Thread Cario, Elaine

Alessandro,

I just got this to work myself:

public static final String DEFINED_FIELDS_API = "/schema/fields";
public static final String DYNAMIC_FIELDS_API = "/schema/dynamicfields";
...
// just get a connection to Solr as usual (the factory is mine - it 
will use CloudSolrServer or HttpSolrServer depending on if we're using 
SolrCloud or not)
SolrClient client = 
SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED);
SolrServer solrConn = client.getConnection(SOLR_URL, collection);

SolrQuery query = new SolrQuery();
if (dynamicFields)
query.setRequestHandler(DYNAMIC_FIELDS_API);
else
query.setRequestHandler(DEFINED_FIELDS_API);
query.setParam("showDefaults", true);

QueryResponse response = solrConn.query(query)

Then you've got to parse the response using NamedList etc.etc.

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Tuesday, July 08, 2014 5:54 AM
To: solr-user@lucene.apache.org
Subject: [Solr Schema API] SolrJ Access

Hi guys,
wondering if there is any proper way to access Schema API via Solrj.

Of course is possible to reach them in Java with a specific Http Request, but 
in this way, using SolrCloud for example we become coupled to one specific 
instance ( and we don't want) .

Code Example :

HttpResponse httpResponse;
> String url=this.solrBase+"/"+core+ 
> SCHEMA_SOLR_FIELDS_ENDPOINT
> +fieldName;
> HttpPut httpPut = new HttpPut(url);
> StringEntity entity = new StringEntity(
> "{\"type\":\"text_general\",\"stored\":\"true\"}" ,
> ContentType.APPLICATION_JSON);
>  httpPut.setEntity( entity );
>  HttpClient client=new DefaultHttpClient();
>  response = client.execute(httpPut);


Any suggestion ?
In my opinion should be interesting to have some auxiliary method in SolrServer 
if it's not there yet.

Cheers

--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

SOLR Talk at AOL Dulles Campus.

2014-07-08 Thread Rishi Easwaran

All, 
There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and 
share it with your colleagues and friends. 
www.meetup.com/Code-Brew/events/192361672/
There will be free food and beer served at this event :)

Thanks,
Rishi.

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Walter Underwood

Local disks or shared network disks?  --wunder


On Jul 8, 2014, at 11:43 AM, Shawn Heisey  wrote:

> On 7/8/2014 1:53 AM, Harald Kirsch wrote:
>> Hi all,
>> 
>> This is what happens when I run a regular wget query to log the
>> current number of documents indexed:
>> 
>> 2014-07-08:07:23:28 QTime=20 numFound="5720168"
>> 2014-07-08:07:24:28 QTime=12 numFound="5721126"
>> 2014-07-08:07:25:28 QTime=19 numFound="5721126"
>> 2014-07-08:07:27:18 QTime=50071 numFound="5721126"
>> 2014-07-08:07:29:08 QTime=50058 numFound="5724494"
>> 2014-07-08:07:30:58 QTime=50033 numFound="5730710"
>> 2014-07-08:07:31:58 QTime=13 numFound="5730710"
>> 2014-07-08:07:33:48 QTime=50065 numFound="5734069"
>> 2014-07-08:07:34:48 QTime=16 numFound="5737742"
>> 2014-07-08:07:36:38 QTime=50037 numFound="5737742"
>> 2014-07-08:07:37:38 QTime=12 numFound="5738190"
>> 2014-07-08:07:38:38 QTime=23 numFound="5741208"
>> 2014-07-08:07:40:29 QTime=50034 numFound="5742067"
>> 2014-07-08:07:41:29 QTime=12 numFound="5742067"
>> 2014-07-08:07:42:29 QTime=17 numFound="5742067"
>> 2014-07-08:07:43:29 QTime=20 numFound="5745497"
>> 2014-07-08:07:44:29 QTime=13 numFound="5745981"
>> 2014-07-08:07:45:29 QTime=23 numFound="5746420"
>> 
>> As you can see, the QTime is just over 50 seconds at irregular intervals.
>> 
>> This happens independent of whether I am indexing documents with
>> around 20 dps or not. First I thought about a dependence on the
>> auto-commit of 5 minutes, but the the 50 seconds hits are too irregular.
>> 
>> Furthermore, and this is *really strange*: when hooking strace on the
>> solr process, the 50 seconds QTimes disappear completely and
>> consistently --- a real Heisenbug.
>> 
>> Nevertheless, strace shows that there is a socket timeout of 50
>> seconds defined in calls like this:
>> 
>> [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
>> 5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>
>> 
>> where the fd=96 is the result of
>> 
>> [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
>> sin_port=htons(57236), sin_addr=inet_addr("ip address of local
>> host")}, [16]) = 96 <0.54>
>> 
>> where again fd=122 is the TCP port on which solr was started.
>> 
>> My hunch is that this is communication between the cores of solr.
>> 
>> I tried to search the internet for such a strange connection between
>> socket timeouts and strace, but could not find anything (the
>> stackoverflow entry from yesterday is my own :-(
>> 
>> 
>> This smells a bit like a race condition/deadlock kind of thing which
>> is broken up by timing differences introduced by stracing the process.
>> 
>> Any hints appreciated.
>> 
>> For completeness, here is my setup:
>> - solr-4.8.1,
>> - cloud version running
>> - 10 shards on 10 cores in one instance
>> - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
>> PATCHLEVEL 2
>> - hosted on a vmware, 4 CPU cores, 16 GB RAM
>> - single digit million docs indexed, exact number does not matter
>> - zero query load
> 
> Long GC pauses would also be my first guess.  DNS problems on the
> inter-server communication for SolrCloud would be a second guess.  If
> it's not one of these, then I really have no idea.
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
> http://serverfault.com/questions/339791/5-second-resolving-delay
> 
> Thanks,
> Shawn
>

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Shawn Heisey

On 7/8/2014 1:53 AM, Harald Kirsch wrote:
> Hi all,
>
> This is what happens when I run a regular wget query to log the
> current number of documents indexed:
>
> 2014-07-08:07:23:28 QTime=20 numFound="5720168"
> 2014-07-08:07:24:28 QTime=12 numFound="5721126"
> 2014-07-08:07:25:28 QTime=19 numFound="5721126"
> 2014-07-08:07:27:18 QTime=50071 numFound="5721126"
> 2014-07-08:07:29:08 QTime=50058 numFound="5724494"
> 2014-07-08:07:30:58 QTime=50033 numFound="5730710"
> 2014-07-08:07:31:58 QTime=13 numFound="5730710"
> 2014-07-08:07:33:48 QTime=50065 numFound="5734069"
> 2014-07-08:07:34:48 QTime=16 numFound="5737742"
> 2014-07-08:07:36:38 QTime=50037 numFound="5737742"
> 2014-07-08:07:37:38 QTime=12 numFound="5738190"
> 2014-07-08:07:38:38 QTime=23 numFound="5741208"
> 2014-07-08:07:40:29 QTime=50034 numFound="5742067"
> 2014-07-08:07:41:29 QTime=12 numFound="5742067"
> 2014-07-08:07:42:29 QTime=17 numFound="5742067"
> 2014-07-08:07:43:29 QTime=20 numFound="5745497"
> 2014-07-08:07:44:29 QTime=13 numFound="5745981"
> 2014-07-08:07:45:29 QTime=23 numFound="5746420"
>
> As you can see, the QTime is just over 50 seconds at irregular intervals.
>
> This happens independent of whether I am indexing documents with
> around 20 dps or not. First I thought about a dependence on the
> auto-commit of 5 minutes, but the the 50 seconds hits are too irregular.
>
> Furthermore, and this is *really strange*: when hooking strace on the
> solr process, the 50 seconds QTimes disappear completely and
> consistently --- a real Heisenbug.
>
> Nevertheless, strace shows that there is a socket timeout of 50
> seconds defined in calls like this:
>
> [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
> 5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>
>
> where the fd=96 is the result of
>
> [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
> sin_port=htons(57236), sin_addr=inet_addr("ip address of local
> host")}, [16]) = 96 <0.54>
>
> where again fd=122 is the TCP port on which solr was started.
>
> My hunch is that this is communication between the cores of solr.
>
> I tried to search the internet for such a strange connection between
> socket timeouts and strace, but could not find anything (the
> stackoverflow entry from yesterday is my own :-(
>
>
> This smells a bit like a race condition/deadlock kind of thing which
> is broken up by timing differences introduced by stracing the process.
>
> Any hints appreciated.
>
> For completeness, here is my setup:
> - solr-4.8.1,
> - cloud version running
> - 10 shards on 10 cores in one instance
> - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
> PATCHLEVEL 2
> - hosted on a vmware, 4 CPU cores, 16 GB RAM
> - single digit million docs indexed, exact number does not matter
> - zero query load

Long GC pauses would also be my first guess.  DNS problems on the
inter-server communication for SolrCloud would be a second guess.  If
it's not one of these, then I really have no idea.

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
http://serverfault.com/questions/339791/5-second-resolving-delay

Thanks,
Shawn

Re: Hypen in search keyword

2014-07-08 Thread Jack Krupansky

The word delimiter filter has a "types" parameter where you specify a file 
that can map hyphen to alpha or numeric.


There is an example in my e-book.

-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Tuesday, July 8, 2014 2:18 PM
To: solr-user@lucene.apache.org
Subject: Hypen in search keyword

I have the below config for the field type text_general. But then I search 
with keyword e.g 100-001, it get 100-001,  100 in starting records & ending 
with 001 . I want to treat "-" as another character not to split.



positionIncrementGap="100">

 
 
 
   words="stopwords.txt" />



generateNumberParts="0" catenateWords="1" catenateNumbers="1" 
catenateAll="0"/>


   
 
 
 
 

   words="stopwords.txt" />
   ignoreCase="true" expand="true"/>

   
generateNumberParts="0" catenateWords="1" catenateNumbers="1" 
catenateAll="0"/>


 
   

Thanks

Ravi

Hypen in search keyword

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

I have the below config for the field type text_general. But then I search with 
keyword e.g 100-001, it get 100-001,  100 in starting records & ending with 001 
. I want to treat "-" as another character not to split.



  
  
  






  
  
  
  





  
  


Thanks

Ravi

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Chris Hostetter


I think you are missunderstanding what Himanshu is suggesting to you.

You don't need to make lots of big changes ot the internals of solr's code 
to get what you want -- instead you can leverage the Atomic Updates & 
Optimistic Concurrency features of Solr to get the existing internal Solr 
to reject any attempts to add a duplicate documentunless the client code 
sending the document specifies it should be an "update".

This means your client code needs to be a bit more sophisticated, but the 
benefit is that you don't have to try to make complex changes to the 
internals of Solr that may be impossible and/or difficult to 
support/upgrade later.

More details...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency

Simplest possible idea based on the basic info you have given so far...

1) send every doc using _version_=-1
2a) if doc update fails with error 409, that means a version of this doc 
already exists
2b) resend just the field changes (using "set" atomic 
operation) and specify _version_=1



: Dear Himanshu,
: Hi,
: You misunderstood what I meant. I am not going to update some field. I am
: going to change what Solr do on duplication of uniquekey field. I dont want
: to solr overwrite Whole document I just want to overwrite some parts of
: document. This situation does not come from user side this is what solr do
: to documents with duplicated uniquekey.
: Regards.
: 
: 
: On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
: himanshu.mehro...@snapdeal.com> wrote:
: 
: > Please look at https://wiki.apache.org/solr/Atomic_Updates
: >
: > This does what you want just update relevant fields.
: >
: > Thanks,
: > Himanshu
: >
: >
: > On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
: > wrote:
: >
: > > Dears,
: > > Hi,
: > > According to my requirement I need to change the default behavior of Solr
: > > for overwriting the whole document on unique-key duplication. I am going
: > to
: > > change that the overwrite just part of document (some fields) and other
: > > parts of document (other fields) remain unchanged. First of all I need to
: > > know such changing in Solr behavior is possible? Second, I really
: > > appreciate if you can guide me through what class/classes should I
: > consider
: > > for changing that?
: > > Best regards.
: > >
: > > --
: > > A.Nazemian
: > >
: >
: 
: 
: 
: -- 
: A.Nazemian
: 

-Hoss
http://www.lucidworks.com/

SolrCloud delete replica

2014-07-08 Thread Arvin Barooni

Hi,

I have an issue regarding collection delete.
when a solr node is in down mode and I delete a collection, all things
seems fine and it deletes the collection from cluster state too.
But when the dead node comes back it register the collection again.

Even when I delete the collection by DELETEREPLICA collection api, the core
inside the dead node starts to push the collection inside clusterstate.json

What is the true config for SolrCloud, ZooKeeper, the solr node or the
leader?

Is there a way to unload or delete the core in down node, after it becomes
active?

Thanks

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-08 Thread Damien Dykman


Thanks for your suggestions and recommendations.

If I understand correctly, the MIGRATE command does shard splitting 
(around the range of the split.key) and merging behind the scene. 
Though, it's a bit difficult to properly monitor the actual migration, 
set the proper timeouts, know when to direct indexing and search traffic 
to the destination collection, etc.


Note sure how to MIGRATE an entire collection. By providing the full 
list of split.keys? I'd be surprised if that was doable, but I guess it 
will skip the splitting part, which makes it easier ;-) Or much tougher 
by splitting around all the ranges. More seriously, doing a MERGEINDEX 
at the core level might not be a bad alternative, providing the hash 
ranges are compatible.


Damien

On 07/07/2014 05:14 PM, Shawn Heisey wrote:

I don't think you'd want to disable mmap. It could be done, by choosing
another DirectoryFactory object. Adding memory is likely to be the only
sane way forward.

Another possibility would be to bump up the maxShardsPerNode value and
build the new collection (with the proper number of shards) only on the
new machines... Then when they are built, move them to their proper homes
and manually adjust the cluster state in zookeeper. This will still
generate a lot of I/O, but hopefully it will last for less time on the
wall clock, and it will be something you can do when load is low.

After that done and you've switched to it, you can add replicas with
either the addreplica collections api or with the core admin api. You
should be on the newest Solr version... Lots of bugs have been found and
fixed.

One thing I wonder is whether the MIGRATE api can be used on an entire
collection. It says it works by shard key, but I suspect that most users
will not be using that functionality.

Thanks,
Shawn

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood

You probably do not need to force merge (mistakenly called "optimize") your 
index.

Solr does automatic merges, which work just fine.

There are only a few situations where a forced merge is even a good idea. The 
most common one is a replicated (non-cloud) setup with a full reindex every 
night.

If you need Solr Cloud, I cannot think of a situation where you would want a 
forced merge.

wunder

On Jul 8, 2014, at 2:01 AM, Modassar Ather  wrote:

> Hi,
> 
> Need to optimize index created using CloudSolrServer APIs under SolrCloud
> setup of 3 instances on separate machines. Currently it optimizes
> sequentially if I invoke cloudSolrServer.optimize().
> 
> To make it parallel I tried making three separate HttpSolrServer instances
> and invoked httpSolrServer.opimize() on them parallely but still it seems
> to be doing optimization sequentially.
> 
> I tried invoking optimize directly using HttpPost with following url and
> parameters but still it seems to be sequential.
> *URL* : http://host:port/solr/collection/update
> 
> *Parameters*:
> params.add(new BasicNameValuePair("optimize", "true"));
> params.add(new BasicNameValuePair("maxSegments", "1"));
> params.add(new BasicNameValuePair("waitFlush", "true"));
> params.add(new BasicNameValuePair("distrib", "false"));
> 
> Kindly provide your suggestion and help.
> 
> Regards,
> Modassar

Re: Slow inserts when using Solr Cloud

2014-07-08 Thread Mark Miller

Updates are currently done locally before concurrently being sent to all 
replicas - so on a single update, you can expect 2x just from that.

As for your results, it sounds like perhaps there is more overhead than we 
would like in the code that sends to replicas and forwards updates? Someone 
would have to dig in to really know I think. I would doubt it’s a configuration 
issue, but you never know.

-- 
Mark Miller
about.me/markrmiller

On July 8, 2014 at 9:18:28 AM, Ian Williams (NWIS - Applications Design) 
(ian.willi...@wales.nhs.uk) wrote:

Hi  

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.  

I have a SolrCloud set up for test and evaluation purposes. I have four shards, 
each with a leader and a replica, distributed over four Windows virtual 
servers. I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred). I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.  

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance. I then try the following scenarios, using 
inserts without commits. E.g. I use:  
curl http://servername:port/solr/update -H "Content-Type: text/xml" 
--data-binary @test.txt  

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time <20ms)  
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud. This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.  

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)  
If I switch on the replica for that shard, then my average response time for an 
insert increases from <10ms to around 250ms. Now I expect an overhead, because 
the leader has to find out where the replica is (from Zookeeper?) and then 
forward the request to that replica, then wait for a reply - but an increase 
from <20ms to 250ms seems very high?  

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)  
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms. Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?  


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?  

Many thanks  
Ian  

---  
Mae?r wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadau?n 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod i?r anfonwr a?i dileu?n 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Mae?n bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig i?r Ddeddf Rhyddid Gwybodaeth. Mae?r farn a 
fynegir yn y neges e-bost hon yn perthyn i?r anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.  

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.  
---

Re: I need a replacement for the QueryElevation Component

2014-07-08 Thread O. Klein

You can sponsor more then 1 document per keyword.


  
  
 

And you might want to try  string instead
of another FieldType. I found that textFields remove whitespace and
concatenated the tokens.

Not sure if this is intended or not.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077p4146090.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Exact Match first in the list.

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Thanks shawn, I am already using the Boosting but the OR condition works for me 
as you mentioned.

One question

If I used in search field "(TAGs)" , it is returning lot of Fields but if try 
with the '(" something like "TAGs", it is getting less, why the " ( )" are 
changing the results.? They won't take the exact match ..? 

Let me know if I am missing something.

Thanks


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, July 07, 2014 8:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact Match first in the list.

> HI, I HAVE A situation where applying below search rules.
>
> When I search columns for the full text search. "Product Variant 
> Name", the exact match has to be in the first list and other match 
> like , product  or variant or name or any combination will be next in the 
> results.
>
> Any thoughts, why analyzer or tokenizer or filter need to use.


This is more a matter of boosting than analysis.

If you are using edismax, this is particularly easy. Just put large boost 
values on the fields in the pf parameter, and you'd likely want to use the same 
field list as the qf parameter.

If you are not using edismax and can construct such a query yourself, you can 
boost the phrase over the individual terms. Here's a sample query:

"Product Variant Name"^10 OR (Product Variant Name)

This is essentially what edismax will do with a boost on the pf values, except 
that it will work with more than one field. The edismax parser is a wonderful 
creation.

Thanks,
Shawn

Slow inserts when using Solr Cloud

2014-07-08 Thread Ian Williams (NWIS - Applications Design)

Hi

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.

I have a SolrCloud set up for test and evaluation purposes.  I have four 
shards, each with a leader and a replica, distributed over four Windows virtual 
servers.  I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred).   I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance.  I then try the following scenarios, using 
inserts without commits.  E.g. I use:
curl http://servername:port/solr/update -H "Content-Type: text/xml" 
--data-binary @test.txt

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time <20ms)
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud.  This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)
If I switch on the replica for that shard, then my average response time for an 
insert increases from <10ms  to around 250ms.  Now I expect an overhead, 
because the leader has to find out where the replica is (from Zookeeper?) and 
then forward the request to that replica, then wait for a reply - but an 
increase from <20ms to 250ms seems very high?

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms.  Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?

Many thanks
Ian

---
Maer wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadaun 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod ir anfonwr ai dileun 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Maen bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig ir Ddeddf Rhyddid Gwybodaeth. Maer farn a 
fynegir yn y neges e-bost hon yn perthyn ir anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.
---

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon

No both are same for me

With Regards
Aman Tandon


On Tue, Jul 8, 2014 at 4:01 PM, Alexandre Rafalovitch 
wrote:

> Right, but the blank field and missing field are different things. Are
> they for you? If yes, then correct, you are stuck with getting them
> back. But if "" blank field is the same as missing/empty field, then
> you can pre-process unify them.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon 
> wrote:
> > @Alex, yes we need them to indexed and stored, as we are doing some
> > processing if fields are blank.
> >
> > @Gora Thanks, i will try this one.
> >
> > Thanks for your quick replies.
> >
> > With Regards
> > Aman Tandon
> >
> >
> > On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty  wrote:
> >
> >> On 8 July 2014 15:46, Aman Tandon  wrote:
> >> > Hi,
> >> >
> >> > Is this possible to not to count the facets for the blank values?
> >> > e.g. cat:
> >> [...]
> >>
> >> Either filter them out in the query, or remove them client-side when
> >> displaying the results.
> >>
> >> Regards,
> >> Gora
> >>
>

I need a replacement for the QueryElevation Component

2014-07-08 Thread eShard

Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that allows us to tag the doc with
keywords that clearly identify it as a promoted document would be ideal.
I tried using an external file field but that only allows numbers and not
strings (please correct me if I'm wrong)
EFF would be ideal if there is a way to make it take strings.
I also need an easy way to add these tags to specific docs.
If possible, I would like to avoid creating a separate elevation core but it
may come down to that...

Thank you, 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html
Sent from the Solr - User mailing list archive at Nabble.com.

[ANN] Solr Users Thailand - unofficial group

2014-07-08 Thread Alexandre Rafalovitch

Hello,

A new Google Group has been recently started for Solr Users who want
to discuss Solr in Thai or need to discuss Solr issues around Thai
language (in Thai or English).
https://groups.google.com/forum/#!forum/solr-user-thailand

The group is monitored by the local Solr consultancy, one of Thai
LucidWorks employees and myself. It's just started, but if this
language is of interest to you, please join and help building a
vibrant community.

As mentioned in the subject, this is not an official group. I hope
though it will become active enough over time to be listed next to the
other user groups on the Wiki.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

JOB: Solr / Elasticsearch engineer @ Sematext

2014-07-08 Thread Otis Gospodnetic

Hi,

I think most people on this list have heard of Sematext
, so I'll skip the company info, and just jump to the
meat, which involves a lot of fun work with Solr and/or Elasticsearch:

We have an opening for an engineer who knows either Elasticsearch or Solr
or both and wants to use these technologies to implement search and
analytics solutions for both Sematext's own products
 such as SPM 
(monitoring,
alerting, machine learning-based anomaly detection, etc.) and Logsene
 (logging), as well as for Sematext's clients
.

More info at:
* http://blog.sematext.com/2014/07/07/job-elasticsearch-solr-engineer/
* http://sematext.com/about/jobs.html

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Re: Facets on Nested documents

2014-07-08 Thread Walter Liguori

Yes, also i've the same problem.
In my case i have 2 type (parent and children) in a single collection and i
want to retrieve only the parent with a facet on a children field.
I've seen that is possible via block join query (availble by solr 4.5).
I've solr 1.2 and I've thinked about static facet field calculated during
indexing time but i'dont see any guide o reference about it.
Walter

Ing. Walter Liguori


2014-07-07 17:59 GMT+02:00 adfel70 :

> Hi,
>
> I indexed different types(different fields) of child docs for every parent.
> I want to do facet on field in one type of child doc and after it to do
> another of facet on different type of child doc. It doesn't work..
>
> Any idea how i can do something like that?
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Facets-on-Nested-documents-tp4145931.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch

Right, but the blank field and missing field are different things. Are
they for you? If yes, then correct, you are stuck with getting them
back. But if "" blank field is the same as missing/empty field, then
you can pre-process unify them.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon  wrote:
> @Alex, yes we need them to indexed and stored, as we are doing some
> processing if fields are blank.
>
> @Gora Thanks, i will try this one.
>
> Thanks for your quick replies.
>
> With Regards
> Aman Tandon
>
>
> On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty  wrote:
>
>> On 8 July 2014 15:46, Aman Tandon  wrote:
>> > Hi,
>> >
>> > Is this possible to not to count the facets for the blank values?
>> > e.g. cat:
>> [...]
>>
>> Either filter them out in the query, or remove them client-side when
>> displaying the results.
>>
>> Regards,
>> Gora
>>

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon

@Alex, yes we need them to indexed and stored, as we are doing some
processing if fields are blank.

@Gora Thanks, i will try this one.

Thanks for your quick replies.

With Regards
Aman Tandon

On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty  wrote:

> On 8 July 2014 15:46, Aman Tandon  wrote:
> > Hi,
> >
> > Is this possible to not to count the facets for the blank values?
> > e.g. cat:
> [...]
>
> Either filter them out in the query, or remove them client-side when
> displaying the results.
>
> Regards,
> Gora
>

Re: don't count facet on blank values

2014-07-08 Thread Gora Mohanty

On 8 July 2014 15:46, Aman Tandon  wrote:
> Hi,
>
> Is this possible to not to count the facets for the blank values?
> e.g. cat:
[...]

Either filter them out in the query, or remove them client-side when
displaying the results.

Regards,
Gora

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch

Do you need those values stored/indexed? If not, why not remove them
before they hit Solr with appropriate UpdateRequestProcessor?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Tue, Jul 8, 2014 at 5:16 PM, Aman Tandon  wrote:
> Hi,
>
> Is this possible to not to count the facets for the blank values?
> e.g. cat:
>
> "cats":[*"",34324,*
> "10",8635,
> "20",8226,
> "50",5162,
> "30",759,
> "100",188,
> "40",13,
> "200",7]
>
> How is this possible?
>
> With Regards
> Aman Tandon

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay

I'm using the google library which I has mentioned in my first mail saying Im 
using http://code.google.com/p/language-detection/. I have downloaded the jar 
file from the below url

https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1


Please let me know from where I need to download the correct jar file.

Regards,
Poornima


On Tuesday, 8 July 2014 3:42 PM, Alexandre Rafalovitch  
wrote:
 


I just realized you are not using Solr language detect libraries. You
are using third party one. You did mention that in your first message.

I don't see that library integrated with Solr though, just as a
standalone library. So, you can't just plug in it.

Is there any reason you cannot use one of the two libraries Solr does
already have (Tika's and Google's)? What's so special about that one?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency



On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay  wrote:
> When i use solr-langid-3.5.0.jar file after reloading the core i am getting 
> the below error
>
> SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
>
>
> Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.
>
> Thanks,
> Poornima
>
>
>
> On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> -- Forwarded message --
>
> From: Poornima Jay 
> Date: Tue, Jul 8, 2014 at 5:03 PM
> Subject: Re: Language detection for solr 3.6.1
>
>
> When i try to use solr-langid-3.6.1.jar file in my path
> /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
> and define the path in the solrconfig.xml as below
>
>  dir="/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/"
> regex="solr-langid-.*\.jar" />
>
> I am getting the below error while reloading the core.
>
> SEVERE: java.lang.NoClassDefFoundError:
> com/cybozu/labs/langdetect/DetectorFactory
>
> Please advice.
>
> Thanks,
> Poornima
>
>
> On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
>  wrote:
>
>
> If you are having troubles with jar location, just use absolute path
> in your lib statement and use path, not dir/regex. That will complain
> louder. You should be using the latest jar matching the version, they
> should be shipped with Solr itself.
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
>  wrote:
>> I am facing the issue with the jar file location. Where should i place the
>> solr-langid-3.6.1.jar. If i place it in the instance folder inside
>> /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
>> Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?
>>
>> Can you please attach the schema file also for reference.
>>
>> 
>> 
>>
>> where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/
>>
>> Thanks for your time.
>>
>> Regards,
>> Poornima
>>
>>
>>
>> On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch 
>> wrote:
>>
>>
>> I've had an example in my book:
>> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
>> , though it was for Solr 4.2+. Solr in Action also has a section on
>> multilingual indexing. There is no generic advice, as everybody seems
>> to have slightly different multilingual requirements, but the books
>> will at least discuss the main issues.
>>
>> Regarding your specific email from a week ago, You haven't actually
>> said what is the problem was. Just what you did. So, we don't know
>> where you are stuck and what - specifically - you need help with.
>>
>> Regards,
>>  Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay 
>> wrote:
>>> Hi,
>>>
>>> Please let me know if anyone had used google language detection for
>>> implementing multilanguage search in one schema.
>>>
>>> Thanks,
>>> Poornima
>>>
>>>
>>>
>>>
>>> On Tuesday, 1 July 2014 6:54 PM, Poornima Jay 
>>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> Can anyone please let me know how to integrate
>>> http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
>>> languages (English, chinese simplified, chinese traditional, Japanes, and
>>> Korean) to be added in one schema ie. multilingual search from single
>>> schema
>>> file.
>>>
>>> I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
>>> location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
>>> in the solrconfig.xml as below
>>>
>>> >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>>
>>>  
>>>    >>
>>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>>    
>>>    content_eng
>>>    true
>>>    content_eng,content_

don't count facet on blank values

2014-07-08 Thread Aman Tandon

Hi,

Is this possible to not to count the facets for the blank values?
e.g. cat:

"cats":[*"",34324,*
"10",8635,
"20",8226,
"50",5162,
"30",759,
"100",188,
"40",13,
"200",7]

How is this possible?

With Regards
Aman Tandon

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch

I just realized you are not using Solr language detect libraries. You
are using third party one. You did mention that in your first message.

I don't see that library integrated with Solr though, just as a
standalone library. So, you can't just plug in it.

Is there any reason you cannot use one of the two libraries Solr does
already have (Tika's and Google's)? What's so special about that one?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay  wrote:
> When i use solr-langid-3.5.0.jar file after reloading the core i am getting 
> the below error
>
> SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
>
>
> Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.
>
> Thanks,
> Poornima
>
>
>
> On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> -- Forwarded message --
>
> From: Poornima Jay 
> Date: Tue, Jul 8, 2014 at 5:03 PM
> Subject: Re: Language detection for solr 3.6.1
>
>
> When i try to use solr-langid-3.6.1.jar file in my path
> /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
> and define the path in the solrconfig.xml as below
>
>  dir="/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/"
> regex="solr-langid-.*\.jar" />
>
> I am getting the below error while reloading the core.
>
> SEVERE: java.lang.NoClassDefFoundError:
> com/cybozu/labs/langdetect/DetectorFactory
>
> Please advice.
>
> Thanks,
> Poornima
>
>
> On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
>  wrote:
>
>
> If you are having troubles with jar location, just use absolute path
> in your lib statement and use path, not dir/regex. That will complain
> louder. You should be using the latest jar matching the version, they
> should be shipped with Solr itself.
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
>  wrote:
>> I am facing the issue with the jar file location. Where should i place the
>> solr-langid-3.6.1.jar. If i place it in the instance folder inside
>> /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
>> Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?
>>
>> Can you please attach the schema file also for reference.
>>
>> 
>> 
>>
>> where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/
>>
>> Thanks for your time.
>>
>> Regards,
>> Poornima
>>
>>
>>
>> On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch 
>> wrote:
>>
>>
>> I've had an example in my book:
>> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
>> , though it was for Solr 4.2+. Solr in Action also has a section on
>> multilingual indexing. There is no generic advice, as everybody seems
>> to have slightly different multilingual requirements, but the books
>> will at least discuss the main issues.
>>
>> Regarding your specific email from a week ago, You haven't actually
>> said what is the problem was. Just what you did. So, we don't know
>> where you are stuck and what - specifically - you need help with.
>>
>> Regards,
>>  Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay 
>> wrote:
>>> Hi,
>>>
>>> Please let me know if anyone had used google language detection for
>>> implementing multilanguage search in one schema.
>>>
>>> Thanks,
>>> Poornima
>>>
>>>
>>>
>>>
>>> On Tuesday, 1 July 2014 6:54 PM, Poornima Jay 
>>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> Can anyone please let me know how to integrate
>>> http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
>>> languages (English, chinese simplified, chinese traditional, Japanes, and
>>> Korean) to be added in one schema ie. multilingual search from single
>>> schema
>>> file.
>>>
>>> I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
>>> location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
>>> in the solrconfig.xml as below
>>>
>>> >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>>
>>>  
>>>>>
>>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>>
>>>content_eng
>>>true
>>>content_eng,content_ja
>>>en,ja
>>>en:english ja:japanese
>>>en
>>>
>>>
>>>  
>>>
>>>  
>>>
>>>langid
>>>
>>>  
>>>
>>> Please suggest me the solution.
>>>
>>> Thanks,
>>> Poornima
>>>
>>
>>

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay

When i use solr-langid-3.5.0.jar file after reloading the core i am getting the 
below error 

SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException


Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.

Thanks,
Poornima



On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch  
wrote:
 


-- Forwarded message --

From: Poornima Jay 
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1


When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below



I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima


On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
 wrote:


If you are having troubles with jar location, just use absolute path
in your lib statement and use path, not dir/regex. That will complain
louder. You should be using the latest jar matching the version, they
should be shipped with Solr itself.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
 wrote:
> I am facing the issue with the jar file location. Where should i place the
> solr-langid-3.6.1.jar. If i place it in the instance folder inside
> /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
> Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?
>
> Can you please attach the schema file also for reference.
>
> 
> 
>
> where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/
>
> Thanks for your time.
>
> Regards,
> Poornima
>
>
>
> On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch 
> wrote:
>
>
> I've had an example in my book:
> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
> , though it was for Solr 4.2+. Solr in Action also has a section on
> multilingual indexing. There is no generic advice, as everybody seems
> to have slightly different multilingual requirements, but the books
> will at least discuss the main issues.
>
> Regarding your specific email from a week ago, You haven't actually
> said what is the problem was. Just what you did. So, we don't know
> where you are stuck and what - specifically - you need help with.
>
> Regards,
>  Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay 
> wrote:
>> Hi,
>>
>> Please let me know if anyone had used google language detection for
>> implementing multilanguage search in one schema.
>>
>> Thanks,
>> Poornima
>>
>>
>>
>>
>> On Tuesday, 1 July 2014 6:54 PM, Poornima Jay 
>> wrote:
>>
>>
>> Hi,
>>
>> Can anyone please let me know how to integrate
>> http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
>> languages (English, chinese simplified, chinese traditional, Japanes, and
>> Korean) to be added in one schema ie. multilingual search from single
>> schema
>> file.
>>
>> I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
>> location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
>> in the solrconfig.xml as below
>>
>> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>
>>  
>>    >
>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>    
>>    content_eng
>>    true
>>    content_eng,content_ja
>>    en,ja
>>    en:english ja:japanese
>>    en
>>    
>>    
>>  
>>
>>  
>>    
>>    langid
>>    
>>  
>>
>> Please suggest me the solution.
>>
>> Thanks,
>> Poornima
>>
>
>

Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch

-- Forwarded message --
From: Poornima Jay 
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1


When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below



I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima


On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
 wrote:


If you are having troubles with jar location, just use absolute path
in your lib statement and use path, not dir/regex. That will complain
louder. You should be using the latest jar matching the version, they
should be shipped with Solr itself.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
 wrote:
> I am facing the issue with the jar file location. Where should i place the
> solr-langid-3.6.1.jar. If i place it in the instance folder inside
> /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
> Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?
>
> Can you please attach the schema file also for reference.
>
> 
> 
>
> where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/
>
> Thanks for your time.
>
> Regards,
> Poornima
>
>
>
> On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch 
> wrote:
>
>
> I've had an example in my book:
> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
> , though it was for Solr 4.2+. Solr in Action also has a section on
> multilingual indexing. There is no generic advice, as everybody seems
> to have slightly different multilingual requirements, but the books
> will at least discuss the main issues.
>
> Regarding your specific email from a week ago, You haven't actually
> said what is the problem was. Just what you did. So, we don't know
> where you are stuck and what - specifically - you need help with.
>
> Regards,
>  Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay 
> wrote:
>> Hi,
>>
>> Please let me know if anyone had used google language detection for
>> implementing multilanguage search in one schema.
>>
>> Thanks,
>> Poornima
>>
>>
>>
>>
>> On Tuesday, 1 July 2014 6:54 PM, Poornima Jay 
>> wrote:
>>
>>
>> Hi,
>>
>> Can anyone please let me know how to integrate
>> http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
>> languages (English, chinese simplified, chinese traditional, Japanes, and
>> Korean) to be added in one schema ie. multilingual search from single
>> schema
>> file.
>>
>> I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
>> location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
>> in the solrconfig.xml as below
>>
>> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>
>>  
>>>
>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>
>>content_eng
>>true
>>content_eng,content_ja
>>en,ja
>>en:english ja:japanese
>>en
>>
>>
>>  
>>
>>  
>>
>>langid
>>
>>  
>>
>> Please suggest me the solution.
>>
>> Thanks,
>> Poornima
>>
>
>

[Solr Schema API] SolrJ Access

2014-07-08 Thread Alessandro Benedetti

Hi guys,
wondering if there is any proper way to access Schema API via Solrj.

Of course is possible to reach them in Java with a specific Http Request,
but in this way, using SolrCloud for example we become coupled to one
specific instance ( and we don't want) .

Code Example :

HttpResponse httpResponse;
> String url=this.solrBase+"/"+core+ SCHEMA_SOLR_FIELDS_ENDPOINT
> +fieldName;
> HttpPut httpPut = new HttpPut(url);
> StringEntity entity = new StringEntity(
> "{\"type\":\"text_general\",\"stored\":\"true\"}" ,
> ContentType.APPLICATION_JSON);
>  httpPut.setEntity( entity );
>  HttpClient client=new DefaultHttpClient();
>  response = client.execute(httpPut);


Any suggestion ?
In my opinion should be interesting to have some auxiliary method in
SolrServer if it's not there yet.

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Hi,

Need to optimize index created using CloudSolrServer APIs under SolrCloud
setup of 3 instances on separate machines. Currently it optimizes
sequentially if I invoke cloudSolrServer.optimize().

To make it parallel I tried making three separate HttpSolrServer instances
and invoked httpSolrServer.opimize() on them parallely but still it seems
to be doing optimization sequentially.

I tried invoking optimize directly using HttpPost with following url and
parameters but still it seems to be sequential.
*URL* : http://host:port/solr/collection/update

*Parameters*:
params.add(new BasicNameValuePair("optimize", "true"));
params.add(new BasicNameValuePair("maxSegments", "1"));
params.add(new BasicNameValuePair("waitFlush", "true"));
params.add(new BasicNameValuePair("distrib", "false"));

Kindly provide your suggestion and help.

Regards,
Modassar

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch


No, no full GC.

The JVM does nothing during the outages, no CPU, no GC, as checked with 
jvisualvm and htop.


Harald.

On 08.07.2014 10:12, Heyde, Ralf wrote:

My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54 AM, "Harald Kirsch"  wrote:


Hi all,

This is what happens when I run a regular wget query to log the current
number of documents indexed:

2014-07-08:07:23:28 QTime=20 numFound="5720168"
2014-07-08:07:24:28 QTime=12 numFound="5721126"
2014-07-08:07:25:28 QTime=19 numFound="5721126"
2014-07-08:07:27:18 QTime=50071 numFound="5721126"
2014-07-08:07:29:08 QTime=50058 numFound="5724494"
2014-07-08:07:30:58 QTime=50033 numFound="5730710"
2014-07-08:07:31:58 QTime=13 numFound="5730710"
2014-07-08:07:33:48 QTime=50065 numFound="5734069"
2014-07-08:07:34:48 QTime=16 numFound="5737742"
2014-07-08:07:36:38 QTime=50037 numFound="5737742"
2014-07-08:07:37:38 QTime=12 numFound="5738190"
2014-07-08:07:38:38 QTime=23 numFound="5741208"
2014-07-08:07:40:29 QTime=50034 numFound="5742067"
2014-07-08:07:41:29 QTime=12 numFound="5742067"
2014-07-08:07:42:29 QTime=17 numFound="5742067"
2014-07-08:07:43:29 QTime=20 numFound="5745497"
2014-07-08:07:44:29 QTime=13 numFound="5745981"
2014-07-08:07:45:29 QTime=23 numFound="5746420"

As you can see, the QTime is just over 50 seconds at irregular intervals.

This happens independent of whether I am indexing documents with around 20
dps or not. First I thought about a dependence on the auto-commit of 5
minutes, but the the 50 seconds hits are too irregular.

Furthermore, and this is *really strange*: when hooking strace on the solr
process, the 50 seconds QTimes disappear completely and consistently --- a
real Heisenbug.

Nevertheless, strace shows that there is a socket timeout of 50 seconds
defined in calls like this:

[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>

where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
sin_port=htons(57236), sin_addr=inet_addr("ip address of local host")},
[16]) = 96 <0.54>

where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between
socket timeouts and strace, but could not find anything (the stackoverflow
entry from yesterday is my own :-(


This smells a bit like a race condition/deadlock kind of thing which is
broken up by timing differences introduced by stracing the process.

Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
PATCHLEVEL 2
- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.





--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49 211 53883-216
Fax +49-211-550266-19
http://www.raytion.com

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian

Dear Himanshu,
Hi,
You misunderstood what I meant. I am not going to update some field. I am
going to change what Solr do on duplication of uniquekey field. I dont want
to solr overwrite Whole document I just want to overwrite some parts of
document. This situation does not come from user side this is what solr do
to documents with duplicated uniquekey.
Regards.


On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
himanshu.mehro...@snapdeal.com> wrote:

> Please look at https://wiki.apache.org/solr/Atomic_Updates
>
> This does what you want just update relevant fields.
>
> Thanks,
> Himanshu
>
>
> On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
> wrote:
>
> > Dears,
> > Hi,
> > According to my requirement I need to change the default behavior of Solr
> > for overwriting the whole document on unique-key duplication. I am going
> to
> > change that the overwrite just part of document (some fields) and other
> > parts of document (other fields) remain unchanged. First of all I need to
> > know such changing in Solr behavior is possible? Second, I really
> > appreciate if you can guide me through what class/classes should I
> consider
> > for changing that?
> > Best regards.
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Heyde, Ralf

My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54 AM, "Harald Kirsch"  wrote:

> Hi all,
>
> This is what happens when I run a regular wget query to log the current
> number of documents indexed:
>
> 2014-07-08:07:23:28 QTime=20 numFound="5720168"
> 2014-07-08:07:24:28 QTime=12 numFound="5721126"
> 2014-07-08:07:25:28 QTime=19 numFound="5721126"
> 2014-07-08:07:27:18 QTime=50071 numFound="5721126"
> 2014-07-08:07:29:08 QTime=50058 numFound="5724494"
> 2014-07-08:07:30:58 QTime=50033 numFound="5730710"
> 2014-07-08:07:31:58 QTime=13 numFound="5730710"
> 2014-07-08:07:33:48 QTime=50065 numFound="5734069"
> 2014-07-08:07:34:48 QTime=16 numFound="5737742"
> 2014-07-08:07:36:38 QTime=50037 numFound="5737742"
> 2014-07-08:07:37:38 QTime=12 numFound="5738190"
> 2014-07-08:07:38:38 QTime=23 numFound="5741208"
> 2014-07-08:07:40:29 QTime=50034 numFound="5742067"
> 2014-07-08:07:41:29 QTime=12 numFound="5742067"
> 2014-07-08:07:42:29 QTime=17 numFound="5742067"
> 2014-07-08:07:43:29 QTime=20 numFound="5745497"
> 2014-07-08:07:44:29 QTime=13 numFound="5745981"
> 2014-07-08:07:45:29 QTime=23 numFound="5746420"
>
> As you can see, the QTime is just over 50 seconds at irregular intervals.
>
> This happens independent of whether I am indexing documents with around 20
> dps or not. First I thought about a dependence on the auto-commit of 5
> minutes, but the the 50 seconds hits are too irregular.
>
> Furthermore, and this is *really strange*: when hooking strace on the solr
> process, the 50 seconds QTimes disappear completely and consistently --- a
> real Heisenbug.
>
> Nevertheless, strace shows that there is a socket timeout of 50 seconds
> defined in calls like this:
>
> [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
> 5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>
>
> where the fd=96 is the result of
>
> [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
> sin_port=htons(57236), sin_addr=inet_addr("ip address of local host")},
> [16]) = 96 <0.54>
>
> where again fd=122 is the TCP port on which solr was started.
>
> My hunch is that this is communication between the cores of solr.
>
> I tried to search the internet for such a strange connection between
> socket timeouts and strace, but could not find anything (the stackoverflow
> entry from yesterday is my own :-(
>
>
> This smells a bit like a race condition/deadlock kind of thing which is
> broken up by timing differences introduced by stracing the process.
>
> Any hints appreciated.
>
> For completeness, here is my setup:
> - solr-4.8.1,
> - cloud version running
> - 10 shards on 10 cores in one instance
> - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
> PATCHLEVEL 2
> - hosted on a vmware, 4 CPU cores, 16 GB RAM
> - single digit million docs indexed, exact number does not matter
> - zero query load
>
>
> Harald.
>

Re: SOLR on hdfs

2014-07-08 Thread shlash

Hi all,
I am new to Solr and hdfs, actually, I am trying to index text content
extracted from binary files like PDF, MS Office...etc which are stored on
hdfs (single node), till now I've running Solr on HDFS, and create the core
but I couldn't send the files to solr for indexing.
Can someone please help me to do that.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-on-hdfs-tp4045128p4146049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Himanshu Mehrotra

Please look at https://wiki.apache.org/solr/Atomic_Updates

This does what you want just update relevant fields.

Thanks,
Himanshu


On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian  wrote:

> Dears,
> Hi,
> According to my requirement I need to change the default behavior of Solr
> for overwriting the whole document on unique-key duplication. I am going to
> change that the overwrite just part of document (some fields) and other
> parts of document (other fields) remain unchanged. First of all I need to
> know such changing in Solr behavior is possible? Second, I really
> appreciate if you can guide me through what class/classes should I consider
> for changing that?
> Best regards.
>
> --
> A.Nazemian
>

Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch


Hi all,

This is what happens when I run a regular wget query to log the current 
number of documents indexed:


2014-07-08:07:23:28 QTime=20 numFound="5720168"
2014-07-08:07:24:28 QTime=12 numFound="5721126"
2014-07-08:07:25:28 QTime=19 numFound="5721126"
2014-07-08:07:27:18 QTime=50071 numFound="5721126"
2014-07-08:07:29:08 QTime=50058 numFound="5724494"
2014-07-08:07:30:58 QTime=50033 numFound="5730710"
2014-07-08:07:31:58 QTime=13 numFound="5730710"
2014-07-08:07:33:48 QTime=50065 numFound="5734069"
2014-07-08:07:34:48 QTime=16 numFound="5737742"
2014-07-08:07:36:38 QTime=50037 numFound="5737742"
2014-07-08:07:37:38 QTime=12 numFound="5738190"
2014-07-08:07:38:38 QTime=23 numFound="5741208"
2014-07-08:07:40:29 QTime=50034 numFound="5742067"
2014-07-08:07:41:29 QTime=12 numFound="5742067"
2014-07-08:07:42:29 QTime=17 numFound="5742067"
2014-07-08:07:43:29 QTime=20 numFound="5745497"
2014-07-08:07:44:29 QTime=13 numFound="5745981"
2014-07-08:07:45:29 QTime=23 numFound="5746420"

As you can see, the QTime is just over 50 seconds at irregular intervals.

This happens independent of whether I am indexing documents with around 
20 dps or not. First I thought about a dependence on the auto-commit of 
5 minutes, but the the 50 seconds hits are too irregular.


Furthermore, and this is *really strange*: when hooking strace on the 
solr process, the 50 seconds QTimes disappear completely and 
consistently --- a real Heisenbug.


Nevertheless, strace shows that there is a socket timeout of 50 seconds 
defined in calls like this:


[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 
5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>


where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, 
sin_port=htons(57236), sin_addr=inet_addr("ip address of local host")}, 
[16]) = 96 <0.54>


where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between 
socket timeouts and strace, but could not find anything (the 
stackoverflow entry from yesterday is my own :-(



This smells a bit like a race condition/deadlock kind of thing which is 
broken up by timing differences introduced by stracing the process.


Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, 
PATCHLEVEL 2

- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.

Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian

Dears,
Hi,
According to my requirement I need to change the default behavior of Solr
for overwriting the whole document on unique-key duplication. I am going to
change that the overwrite just part of document (some fields) and other
parts of document (other fields) remain unchanged. First of all I need to
know such changing in Solr behavior is possible? Second, I really
appreciate if you can guide me through what class/classes should I consider
for changing that?
Best regards.

-- 
A.Nazemian

58 matches

Mail list logo